From florent.angly at gmail.com Thu Mar 1 19:12:19 2012 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 02 Mar 2012 10:12:19 +1000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> Message-ID: <4F501063.4010109@gmail.com> Thanks for everybody's feedback. I am looking at existing modules to hold template sequence, amplicon sequence and primer information. There is the Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the PrimedSeq object places Primer objects on the target sequence. I have been looking at refreshing these modules (they are quite old), add some sanity to them and make sure they are suitable for a generic implementation of PCR (or amplicon search, which I find a more suitable name since it is a far cry from simulating PCR cycles, etc). I will make a remote branch today to make it easier for interested parties to experiment and contribute. As you can see Chris, the amplicon search feature would use two existing bioperl-live modules and only add one, tentatively in the Bio::Tools::AmpliconSearch namespace. I am not convinced that this warrants a separate distro. Florent On 01/03/12 01:23, Fields, Christopher J wrote: > Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: > > https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a > > and it's not there. > > I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. > > chris > > On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: > >> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >> >> (There's supposedly a more recent version here: >> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >> but that file seems to be truncated). >> >> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >> >> Cheers, >> Roy. >> >> >> On 27/02/2012 21:18, Fields, Christopher J wrote: >>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>> >>>> Hi all, >>>> >>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>> it was added to Bioperl 0.3 and is also mentionned in the >>>> Bio::PrimedSeq module. However, I cannot find in the current >>>> Bioperl codebase. Any idea where it went? >>> No idea; I can't find it anywhere in the code base either, and the >>> github repo contains history going back to the original CVS repo. >>> You can try contacting the author, possibly. >>> >>>> The reason I am asking is because I have some code to do silico PCR >>>> using regular expressions. I wanted to modularize my code more and >>>> make it into a module for Bioperl. Of course, if there is something >>>> similar in Bioperl already, I need to have a look at it. If there >>>> is nothing similar, what namespace do you suggest to use? >>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>> Bio::Tools::InSilicoPCR? >>>> >>>> Thanks, >>>> >>>> Florent >>> >>> Maybe the last (InSilicoPCR). >>> >>> chris >>> >>> >>> _______________________________________________ Bioperl-l mailing >>> list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Thu Mar 1 20:14:25 2012 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 02 Mar 2012 11:14:25 +1000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> Message-ID: <4F501EF1.4050506@gmail.com> As mentionned earlier, here is the 'amplicons' development branch: https://github.com/bioperl/bioperl-live/tree/amplicons Florent On 01/03/12 01:23, Fields, Christopher J wrote: > Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: > > https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a > > and it's not there. > > I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. > > chris > > On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: > >> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >> >> (There's supposedly a more recent version here: >> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >> but that file seems to be truncated). >> >> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >> >> Cheers, >> Roy. >> >> >> On 27/02/2012 21:18, Fields, Christopher J wrote: >>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>> >>>> Hi all, >>>> >>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>> it was added to Bioperl 0.3 and is also mentionned in the >>>> Bio::PrimedSeq module. However, I cannot find in the current >>>> Bioperl codebase. Any idea where it went? >>> No idea; I can't find it anywhere in the code base either, and the >>> github repo contains history going back to the original CVS repo. >>> You can try contacting the author, possibly. >>> >>>> The reason I am asking is because I have some code to do silico PCR >>>> using regular expressions. I wanted to modularize my code more and >>>> make it into a module for Bioperl. Of course, if there is something >>>> similar in Bioperl already, I need to have a look at it. If there >>>> is nothing similar, what namespace do you suggest to use? >>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>> Bio::Tools::InSilicoPCR? >>>> >>>> Thanks, >>>> >>>> Florent >>> >>> Maybe the last (InSilicoPCR). >>> >>> chris >>> >>> >>> _______________________________________________ Bioperl-l mailing >>> list Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l From j_martin at lbl.gov Thu Mar 1 03:16:10 2012 From: j_martin at lbl.gov (Joel Martin) Date: Thu, 1 Mar 2012 00:16:10 -0800 Subject: [Bioperl-l] fastq splitter In-Reply-To: <20302.33287.509248.407270@gargle.gargle.HOWL> References: <20302.33287.509248.407270@gargle.gargle.HOWL> Message-ID: Just a caution to double check that the read1 and read2 names match after splitting. I don't know if this thread jinxed me or what, but I just for the first time received a concatenated fastq file formatted as you describe, except the first read1 doesn't match the first read2. zut alores! came up with converting to scarf, /usr/bin/sort the scarf, then read that with tossing into single or paired files and reconverting to fastq in the process. it wasn't too bad, but I don't think bioperl has a scarf conversion, it's basically fastq with : substituted for \n. most delimeters that aren't : would work better but i already had a fastq2scarf from early solexa days ( i think ). # this was the last step, if it's handy for this plague of hideous files, the fixed fields for : would need adjusting use strict; open( my $oph, '>', 'paired.fq' ) or die $!; open( my $osh, '>', 'single.fq' ) or die $!; my ( $pend, $pname, $pline ); while ( <>) { my ( $name, $end ) = /^(\S+)\s(\d)/; if ( $end == 1 ) { if ( $pend ) { print_reads( $osh, $pline ); } $pend = $end; $pname = $name; $pline = $_; } elsif ( $end == 2 ) { my $fh = $pend == 1 && $pname eq $name ? $oph : $osh; print_reads( $fh, $pline, $_ ); $pend = ''; } else { die "ERROR: can't interpret line $. $_"; } } sub print_reads { my ( $fh, @reads ) = @_; for my $scarf ( @reads ) { my @stuff = split /:/,$scarf,12; print $fh '@',join(':', at stuff[0..9]),"\n$stuff[10]\n+\n$stuff[11]"; } } Joel On Wed, Feb 29, 2012 at 11:52 AM, George Hartzell wrote: > Fields, Christopher J writes: > > Just want to say, if you can set up a local perl and local::lib it > > makes your life a LOT easier. Particularly if you are running jobs > > on older versions of RHEL, which notoriously stuck with > > outdated/broken versions of perl (as well as other tools). > > [...] > > And Perlbrew takes away your last excuse for not building perls and > setting up local::lib's. > > http://perlbrew.pl/ > > g. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From birney at ebi.ac.uk Thu Mar 1 05:54:56 2012 From: birney at ebi.ac.uk (Ewan Birney) Date: Thu, 1 Mar 2012 10:54:56 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> <4F4EA7A0.9050002@gmail.com> <3E9B001D-89E5-42B6-835D-96D8CE362AE3@illinois.edu> Message-ID: >>> >>> Unaligned BAM makes the most sense. I've also been talking with the >>> HDF5 folks here sporadically, they're still keen on promoting BioHDF >>> (it is pretty fast), though that has cooled considerably. >>> >>> Anyone working directly with CRAM in their pipelines? >>> >>> chris >> >> I understand that Sanger are looking at moving their pipelines from BAM to >> CRAM later this year, but CRAM is still quite new and in flux. >> >> Peter > > Yeah, I wasn't sure how the community outside of Sanger is approaching this. > A number of people are looking at this in different contexts. With the forthcoming 0.7 release, where arbitary tags are stored (in compressed form), and the already distributed optimised lossless compression in 0.6 it makes the first adoption of CRAM (being bascially a compressed BAM with no major loss of information - read names have to go, but that's it) smoother for people to adopt. In our hands this gives a 2-3 fold compression over BAMs (depending on what you have in the BAMs) with alot of this now being how many tags and how well those tags compress. The important thing though is that we have CRAMs without/less tags being 2 or 3 fold more than this (totally of 5 to 10 fold on current BAMs) but still lossless on bases plus quality. In the future - with lossy behaviour on qualities - this can go as low as 0.2 bits/base (bits/base - meaning the number of bits needed for a storage of a base including the quality model used is our preferred way of thinking about this). Check out the GR paper from last year: http://ukpmc.ac.uk/articles/PMC3083090 And check out three blog posts on this: http://genomeinformatician.blogspot.com/2011/05/compressing-dna-part-1.html http://genomeinformatician.blogspot.com/2011/05/engineering-around-reference-based.html http://genomeinformatician.blogspot.com/2011/05/compressing-dna-future-plan.html And note the CRAM development list: http://www.ebi.ac.uk/ena/about/cram_toolkit > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From harpactocrates at googlemail.com Thu Mar 1 09:41:47 2012 From: harpactocrates at googlemail.com (Pablo marin-garcia) Date: Thu, 1 Mar 2012 15:41:47 +0100 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> <4F4EA7A0.9050002@gmail.com> <3E9B001D-89E5-42B6-835D-96D8CE362AE3@illinois.edu> Message-ID: On Wed, Feb 29, 2012 at 4:32 PM, Peter Cock wrote: > On Wed, Feb 29, 2012 at 3:27 PM, Fields, Christopher J > wrote: >> On Feb 29, 2012, at 4:32 AM, Peter Cock wrote: >> >>> On Wed, Feb 29, 2012 at 2:42 AM, Fields, Christopher J >>> wrote: >>>> Frankly, there never seemed to be a real fixed standard in the way that FASTQ >>>> headers were written (and just when it seems there is some consensus, Illumina >>>> pulls the rug out from under you), hence the reason I leave it alone. ?We could >>>> add some ID munging in there if needed, would just need a qr// with a standard >>>> fallback. >>>> >>>> chris >>> >>> Indeed - just like FASTA, it seems every company/tool/database has its own >>> conventions about the FASTQ ID line and how to stuff as much meta-data >>> into it as possible. This is a major reason why I hope unaligned reads in >>> SAM/BAM takes off - places like the Sanger and Broad use this in their >>> pipelines. >>> >>> http://blastedbio.blogspot.com/2011/10/fastq-must-die-long-live-sambam.html >>> >>> Peter >> >> Unaligned BAM makes the most sense. ?I've also been talking with the >> HDF5 folks here sporadically, they're still keen on promoting BioHDF >> (it is pretty fast), though that has cooled considerably. >> >> Anyone working directly with CRAM in their pipelines? >> >> chris > > I understand that Sanger are looking at moving their pipelines from BAM to > CRAM later this year, but CRAM is still quite new and in flux. > my concern is that being CRAM based in delta compression (comparison against reference), I am not sure how much compression it would achieve with unaligned bams. The other thing that CRAM does is to remove a lot of extra tags and metadata (even from the header reference info), and here the strong point of bam against FASTQ is the availability of structured metadata. CRAM is still in development in this area so we will see where they go. > Peter > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ?? - Pablo Marin-Garcia From p.j.a.cock at googlemail.com Thu Mar 1 10:03:02 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Mar 2012 15:03:02 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> <4F4EA7A0.9050002@gmail.com> <3E9B001D-89E5-42B6-835D-96D8CE362AE3@illinois.edu> Message-ID: On Thu, Mar 1, 2012 at 2:41 PM, Pablo marin-garcia wrote: > On Wed, Feb 29, 2012 at 4:32 PM, Peter Cock wrote: >> >> I understand that Sanger are looking at moving their pipelines from BAM to >> CRAM later this year, but CRAM is still quite new and in flux. >> > > my concern is that being CRAM based in delta compression (comparison > against reference), I ?am not sure how much compression it would > achieve with unaligned bams. This can be done with an appropriate dummy reference, for instance from a mini-assembly of the unmapped reads. > The other thing that CRAM does is to > remove a lot of extra tags and metadata (even from the header > reference info), and here the strong point of bam against FASTQ is the > availability of structured metadata. CRAM is still in development in > this area so we will see where they go. Did you miss Ewan's reply about CRAM 0.7 which is due soon? http://lists.open-bio.org/pipermail/bioperl-l/2012-March/036295.html Might this be better continued on the cram-dev list http://listserver.ebi.ac.uk/mailman/listinfo/cram-dev or on this SEQanswers thread? http://seqanswers.com/forums/showthread.php?t=18050 Peter From avilella at gmail.com Thu Mar 1 10:45:19 2012 From: avilella at gmail.com (Albert Vilella) Date: Thu, 1 Mar 2012 15:45:19 +0000 Subject: [Bioperl-l] mummer3 output format Message-ID: Hi, I am trying to understand how to transform Mummer3's output format into something I can pipe into another program, like MAF or similar. How can I parse the results so that I can then do a write_aln into MAF o similar? Details: If I run nucmer v.3.23 with the options below, I get an out.delta like this: ~/MUMmer3.23/nucmer -maxgap $g -l $l $ref $qry ------------------ Leishmania_major.LM2.12.dna.toplevel.fa LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta NUCMER >LmjF.34 ULAVAL|LtaPseq521 1866748 641 959335 959806 169 640 91 91 0 20 17 -3 -2 -183 5 0 >LmjF.12 ULAVAL|LtaPseq501 675346 1438 322990 324081 1436 342 178 178 0 -45 -1 -1 -1 This doesn't look like any of the formats in t/AlignIO/mummer.t to me. I can also run: ~/MUMmer3.23/show-aligns out.delta $region1 $region2 Which gives me something that looks like a blast or exonerate output, like so: ------ Leishmania_major.LM2.12.dna.toplevel.fa LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta ============================================================ -- Alignments between LmjF.34 and ULAVAL|LtaPseq521 -- BEGIN alignment [ +1 959335 - 959806 | +1 169 - 640 ] 959335 ? ? cacacgcctcgtagaggtctccttgctttcgcgcggtgc.c.tcacttg 169 ? ? ? ?cacacgcctcgtagagatc.ccctgccttcgcgcgg.gctcttcacttg ? ? ? ? ? ? ? ? ? ? ? ? ? ?^ ?^ ?^ ? ^ ? ? ? ? ^ ?^ ^ 959382 ? ? cgcatgcggtagtagaagagaatgctgtgggcccacccagcgtagttgc 216 ? ? ? ?cgcatgcggtagtagaagagaatgctgtgtgcccacccagcgtagttgc ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ^ 959431 ? ? caaacagcttccggaaggcctcctgaatgacgttatgatgccgctcgta 265 ? ? ? ?caaacagtttccagaaggcatcctggataacattatgatgccgttcgta ? ? ? ? ? ? ? ? ? ^ ? ?^ ? ? ?^ ? ? ^ ?^ ?^ ? ? ? ? ? ^ 959480 ? ? caagggtgggacaggcgtttttcgtgaggcgcgcagcggggctgctgca 314 ? ? ? ?caggggcggcacaggtgttttccgtaaggcacgtgaagaggtcgttgca ? ? ? ? ? ? ?^ ? ^ ?^ ? ? ^ ? ? ^ ? ^ ? ?^ ?^^^^ ^ ?^^ ^ 959529 ? ? gagcttccaccttcctctatcgccttta.cggtcgctggcgacacgcct 363 ? ? ? ?gagcctccgtttcccttcaccgcccgcagcgat.gatgatgtcactcct ? ? ? ? ? ? ? ?^ ? ^^^ ^ ? ^^ ^ ? ?^^^ ^ ?^ ^ ^ ?^^ ^ ? ^ 959577 ? ? ttcttaaccttgagaacctccgcctgcttcctccactccagcagcagat 411 ? ? ? ?ttcttcaccttgagagcctccgcctggttcttccactccaggagaagat ? ? ? ? ? ? ? ? ^ ? ? ? ? ^ ? ? ? ? ?^ ? ^ ? ? ? ? ?^ ?^ 959626 ? ? tatcccgtgagcgggcttcctcttcgggcaacggacaccctggacgaga 460 ? ? ? ?cagtgggtgcgcagacttcttcttcgcgcagtagagaccctgagcgaga ? ? ? ? ? ?^ ^^^^ ? ^ ?^ ^ ? ?^ ? ? ?^ ? ^^^ ?^ ? ? ?^^ 959675 ? ? gcgcttacgacccaccgccgtcgcggcgcttggtgcggcaaggtactcc 509 ? ? ? ?acgctttcgacccgccgatgtcacggtgcttgcggtggcaagatactcc ? ? ? ? ? ?^ ? ? ^ ? ? ?^ ? ^^ ? ^ ? ^ ? ? ^^ ^ ? ? ?^ 959724 ? ? accgcaacttgcgccatgtgcgtgtccacggggacaatgtgggtgcggt 558 ? ? ? ?accgaaacctgcgccatgtgtgtgtccacggggacgatgtgggtgcggt ? ? ? ? ? ? ? ?^ ? ^ ? ? ? ? ? ^ ? ? ? ? ? ? ?^ 959773 ? ? tgagcgcgaagagcgccacgcagtcagcaacttt 607 ? ? ? ?tgagagcaaagagcgccacgcaatccgccacttt ? ? ? ? ? ? ? ?^ ?^ ? ? ? ? ? ? ?^ ?^ ?^ -- ? END alignment [ +1 959335 - 959806 | +1 169 - 640 ] ============================================================ From roy.chaudhuri at gmail.com Thu Mar 1 10:56:36 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 01 Mar 2012 15:56:36 +0000 Subject: [Bioperl-l] mummer3 output format In-Reply-To: References: Message-ID: <4F4F9C34.2030103@gmail.com> Hi Albert, The show-coords program converts the delta file into a coords file which is much easier to parse. It is run automatically if you provide the --coords flag to nucmer/promer. There was talk of a BioPerl MUMmer parser a while back but I'm not sure if it got anywhere. You might also look at Mugsy, which uses MUMmer and outputs MAF, so may contain some code that can be recycled - it is written in Perl I think. Cheers, Roy. On 01/03/2012 15:45, Albert Vilella wrote: > Hi, > > I am trying to understand how to transform Mummer3's output format > into something I can pipe into another program, like MAF or similar. > How can I parse the results so that I can then do a write_aln into MAF > o similar? > > Details: > > If I run nucmer v.3.23 with the options below, I get an out.delta like this: > > ~/MUMmer3.23/nucmer -maxgap $g -l $l $ref $qry > > ------------------ > Leishmania_major.LM2.12.dna.toplevel.fa > LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta > NUCMER >> LmjF.34 ULAVAL|LtaPseq521 1866748 641 > 959335 959806 169 640 91 91 0 > 20 > 17 > -3 > -2 > -183 > 5 > 0 >> LmjF.12 ULAVAL|LtaPseq501 675346 1438 > 322990 324081 1436 342 178 178 0 > -45 > -1 > -1 > -1 > > This doesn't look like any of the formats in t/AlignIO/mummer.t to me. > > I can also run: > > ~/MUMmer3.23/show-aligns out.delta $region1 $region2 > > Which gives me something that looks like a blast or exonerate output, like so: > > ------ > Leishmania_major.LM2.12.dna.toplevel.fa > LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta > > ============================================================ > -- Alignments between LmjF.34 and ULAVAL|LtaPseq521 > > -- BEGIN alignment [ +1 959335 - 959806 | +1 169 - 640 ] > > > 959335 cacacgcctcgtagaggtctccttgctttcgcgcggtgc.c.tcacttg > 169 cacacgcctcgtagagatc.ccctgccttcgcgcgg.gctcttcacttg > ^ ^ ^ ^ ^ ^ ^ > > 959382 cgcatgcggtagtagaagagaatgctgtgggcccacccagcgtagttgc > 216 cgcatgcggtagtagaagagaatgctgtgtgcccacccagcgtagttgc > ^ > > 959431 caaacagcttccggaaggcctcctgaatgacgttatgatgccgctcgta > 265 caaacagtttccagaaggcatcctggataacattatgatgccgttcgta > ^ ^ ^ ^ ^ ^ ^ > > 959480 caagggtgggacaggcgtttttcgtgaggcgcgcagcggggctgctgca > 314 caggggcggcacaggtgttttccgtaaggcacgtgaagaggtcgttgca > ^ ^ ^ ^ ^ ^ ^ ^^^^ ^ ^^ ^ > > 959529 gagcttccaccttcctctatcgccttta.cggtcgctggcgacacgcct > 363 gagcctccgtttcccttcaccgcccgcagcgat.gatgatgtcactcct > ^ ^^^ ^ ^^ ^ ^^^ ^ ^ ^ ^ ^^ ^ ^ > > 959577 ttcttaaccttgagaacctccgcctgcttcctccactccagcagcagat > 411 ttcttcaccttgagagcctccgcctggttcttccactccaggagaagat > ^ ^ ^ ^ ^ ^ > > 959626 tatcccgtgagcgggcttcctcttcgggcaacggacaccctggacgaga > 460 cagtgggtgcgcagacttcttcttcgcgcagtagagaccctgagcgaga > ^ ^^^^ ^ ^ ^ ^ ^ ^^^ ^ ^^ > > 959675 gcgcttacgacccaccgccgtcgcggcgcttggtgcggcaaggtactcc > 509 acgctttcgacccgccgatgtcacggtgcttgcggtggcaagatactcc > ^ ^ ^ ^^ ^ ^ ^^ ^ ^ > > 959724 accgcaacttgcgccatgtgcgtgtccacggggacaatgtgggtgcggt > 558 accgaaacctgcgccatgtgtgtgtccacggggacgatgtgggtgcggt > ^ ^ ^ ^ > > 959773 tgagcgcgaagagcgccacgcagtcagcaacttt > 607 tgagagcaaagagcgccacgcaatccgccacttt > ^ ^ ^ ^ ^ > > > -- END alignment [ +1 959335 - 959806 | +1 169 - 640 ] > > ============================================================ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Thu Mar 1 11:13:02 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 01 Mar 2012 16:13:02 +0000 Subject: [Bioperl-l] mummer3 output format In-Reply-To: <4F4F9C34.2030103@gmail.com> References: <4F4F9C34.2030103@gmail.com> Message-ID: <4F4FA00E.5040300@gmail.com> Sorry, I'd completely missed Bio::AlignIO::mummer. However this seems to be aimed at parsing the output of the mummer program (as opposed to nucmer/promer) so I guess the advice about show-coords still stands. On 01/03/2012 15:56, Roy Chaudhuri wrote: > Hi Albert, > > The show-coords program converts the delta file into a coords file which > is much easier to parse. It is run automatically if you provide the > --coords flag to nucmer/promer. > > There was talk of a BioPerl MUMmer parser a while back but I'm not sure > if it got anywhere. > > You might also look at Mugsy, which uses MUMmer and outputs MAF, so may > contain some code that can be recycled - it is written in Perl I think. > > Cheers, > Roy. > > On 01/03/2012 15:45, Albert Vilella wrote: >> Hi, >> >> I am trying to understand how to transform Mummer3's output format >> into something I can pipe into another program, like MAF or similar. >> How can I parse the results so that I can then do a write_aln into MAF >> o similar? >> >> Details: >> >> If I run nucmer v.3.23 with the options below, I get an out.delta like this: >> >> ~/MUMmer3.23/nucmer -maxgap $g -l $l $ref $qry >> >> ------------------ >> Leishmania_major.LM2.12.dna.toplevel.fa >> LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta >> NUCMER >>> LmjF.34 ULAVAL|LtaPseq521 1866748 641 >> 959335 959806 169 640 91 91 0 >> 20 >> 17 >> -3 >> -2 >> -183 >> 5 >> 0 >>> LmjF.12 ULAVAL|LtaPseq501 675346 1438 >> 322990 324081 1436 342 178 178 0 >> -45 >> -1 >> -1 >> -1 >> >> This doesn't look like any of the formats in t/AlignIO/mummer.t to me. >> >> I can also run: >> >> ~/MUMmer3.23/show-aligns out.delta $region1 $region2 >> >> Which gives me something that looks like a blast or exonerate output, like so: >> >> ------ >> Leishmania_major.LM2.12.dna.toplevel.fa >> LtarParrotTarIIGenomic_TriTrypDB-4.0.fasta >> >> ============================================================ >> -- Alignments between LmjF.34 and ULAVAL|LtaPseq521 >> >> -- BEGIN alignment [ +1 959335 - 959806 | +1 169 - 640 ] >> >> >> 959335 cacacgcctcgtagaggtctccttgctttcgcgcggtgc.c.tcacttg >> 169 cacacgcctcgtagagatc.ccctgccttcgcgcgg.gctcttcacttg >> ^ ^ ^ ^ ^ ^ ^ >> >> 959382 cgcatgcggtagtagaagagaatgctgtgggcccacccagcgtagttgc >> 216 cgcatgcggtagtagaagagaatgctgtgtgcccacccagcgtagttgc >> ^ >> >> 959431 caaacagcttccggaaggcctcctgaatgacgttatgatgccgctcgta >> 265 caaacagtttccagaaggcatcctggataacattatgatgccgttcgta >> ^ ^ ^ ^ ^ ^ ^ >> >> 959480 caagggtgggacaggcgtttttcgtgaggcgcgcagcggggctgctgca >> 314 caggggcggcacaggtgttttccgtaaggcacgtgaagaggtcgttgca >> ^ ^ ^ ^ ^ ^ ^ ^^^^ ^ ^^ ^ >> >> 959529 gagcttccaccttcctctatcgccttta.cggtcgctggcgacacgcct >> 363 gagcctccgtttcccttcaccgcccgcagcgat.gatgatgtcactcct >> ^ ^^^ ^ ^^ ^ ^^^ ^ ^ ^ ^ ^^ ^ ^ >> >> 959577 ttcttaaccttgagaacctccgcctgcttcctccactccagcagcagat >> 411 ttcttcaccttgagagcctccgcctggttcttccactccaggagaagat >> ^ ^ ^ ^ ^ ^ >> >> 959626 tatcccgtgagcgggcttcctcttcgggcaacggacaccctggacgaga >> 460 cagtgggtgcgcagacttcttcttcgcgcagtagagaccctgagcgaga >> ^ ^^^^ ^ ^ ^ ^ ^ ^^^ ^ ^^ >> >> 959675 gcgcttacgacccaccgccgtcgcggcgcttggtgcggcaaggtactcc >> 509 acgctttcgacccgccgatgtcacggtgcttgcggtggcaagatactcc >> ^ ^ ^ ^^ ^ ^ ^^ ^ ^ >> >> 959724 accgcaacttgcgccatgtgcgtgtccacggggacaatgtgggtgcggt >> 558 accgaaacctgcgccatgtgtgtgtccacggggacgatgtgggtgcggt >> ^ ^ ^ ^ >> >> 959773 tgagcgcgaagagcgccacgcagtcagcaacttt >> 607 tgagagcaaagagcgccacgcaatccgccacttt >> ^ ^ ^ ^ ^ >> >> >> -- END alignment [ +1 959335 - 959806 | +1 169 - 640 ] >> >> ============================================================ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From harpactocrates at googlemail.com Thu Mar 1 12:28:28 2012 From: harpactocrates at googlemail.com (Pablo marin-garcia) Date: Thu, 1 Mar 2012 18:28:28 +0100 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> <4F4EA7A0.9050002@gmail.com> <3E9B001D-89E5-42B6-835D-96D8CE362AE3@illinois.edu> Message-ID: On Thu, Mar 1, 2012 at 4:03 PM, Peter Cock wrote: > On Thu, Mar 1, 2012 at 2:41 PM, Pablo marin-garcia > wrote: >> On Wed, Feb 29, 2012 at 4:32 PM, Peter Cock wrote: >>> >>> I understand that Sanger are looking at moving their pipelines from BAM to >>> CRAM later this year, but CRAM is still quite new and in flux. >>> >> >> my concern is that being CRAM based in delta compression (comparison >> against reference), I ?am not sure how much compression it would >> achieve with unaligned bams. > > This can be done with an appropriate dummy reference, for instance > from a mini-assembly of the unmapped reads. > >> The other thing that CRAM does is to >> remove a lot of extra tags and metadata (even from the header >> reference info), and here the strong point of bam against FASTQ is the >> availability of structured metadata. CRAM is still in development in >> this area so we will see where they go. > > Did you miss Ewan's reply about CRAM 0.7 which is due soon? > http://lists.open-bio.org/pipermail/bioperl-l/2012-March/036295.html > yes I miss it. > Might this be better continued on the cram-dev list > http://listserver.ebi.ac.uk/mailman/listinfo/cram-dev > or on this SEQanswers thread? > http://seqanswers.com/forums/showthread.php?t=18050 > > Peter -- ?? - Pablo Marin-Garcia From cjfields at illinois.edu Thu Mar 1 12:38:08 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 1 Mar 2012 17:38:08 +0000 Subject: [Bioperl-l] fastq splitter In-Reply-To: References: <2E3E60E8-370A-4495-8A59-1B62C59B2AF8@hudsonalpha.org> <4F4EA7A0.9050002@gmail.com> <3E9B001D-89E5-42B6-835D-96D8CE362AE3@illinois.edu> Message-ID: <85851106-67F3-4F1D-A103-6C6EFBF48BBC@illinois.edu> On Mar 1, 2012, at 4:54 AM, Ewan Birney wrote: >>>> >>>> Unaligned BAM makes the most sense. I've also been talking with the >>>> HDF5 folks here sporadically, they're still keen on promoting BioHDF >>>> (it is pretty fast), though that has cooled considerably. >>>> >>>> Anyone working directly with CRAM in their pipelines? >>>> >>>> chris >>> >>> I understand that Sanger are looking at moving their pipelines from BAM to >>> CRAM later this year, but CRAM is still quite new and in flux. >>> >>> Peter >> >> Yeah, I wasn't sure how the community outside of Sanger is approaching this. >> > > A number of people are looking at this in different contexts. With the forthcoming > 0.7 release, where arbitary tags are stored (in compressed form), and the already > distributed optimised lossless compression in 0.6 it makes the first adoption of > CRAM (being bascially a compressed BAM with no major loss of information - read > names have to go, but that's it) smoother for people to adopt. > > > In our hands this gives a 2-3 fold compression over BAMs (depending on what you > have in the BAMs) with alot of this now being how many tags and how well those > tags compress. The important thing though is that we have CRAMs without/less > tags being 2 or 3 fold more than this (totally of 5 to 10 fold on current BAMs) > but still lossless on bases plus quality. In the future - with lossy behaviour > on qualities - this can go as low as 0.2 bits/base (bits/base - meaning the number > of bits needed for a storage of a base including the quality model used is our > preferred way of thinking about this). > > > Check out the GR paper from last year: > > http://ukpmc.ac.uk/articles/PMC3083090 > > > > And check out three blog posts on this: > > http://genomeinformatician.blogspot.com/2011/05/compressing-dna-part-1.html > > http://genomeinformatician.blogspot.com/2011/05/engineering-around-reference-based.html > > http://genomeinformatician.blogspot.com/2011/05/compressing-dna-future-plan.html > > > > And note the CRAM development list: > > http://www.ebi.ac.uk/ena/about/cram_toolkit Yep, already on it. :) Thanks for the blog pointer Ewan, didn't see those before. We've been discussing options for storing data locally and may be centering on CRAM, though locally we have the HDF5 group as well (as I mentioned before), who have been promoting BioHDF for a bit now. Not sure of the status on that as of yet. chris From p.j.a.cock at googlemail.com Fri Mar 2 08:54:09 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 2 Mar 2012 13:54:09 +0000 Subject: [Bioperl-l] Update: call for Google Summer of Code project ideas In-Reply-To: <4F4BAE5B.40309@gmail.com> References: <4F4BAE5B.40309@gmail.com> Message-ID: On Mon, Feb 27, 2012 at 4:24 PM, Robert Buels wrote: > Hi all, > > As kindly pointed out by Reece Hart, the previous email I sent out calling > for Google Summer of Code project ideas, had the wrong due date for project > ideas in it. > > I actually want them to all be in place by Friday, March 2, which is this > coming Friday. > > == Instructions for Wiki Editing == > > For each of the OBF projects that wants to do GSoC again this year, please: > > a.) Update the list of project ideas on your project's GSoC page (BioPython, > BioPerl, BioRuby, etc). ?Add new ones, remove ones that have already been > done or no longer relevant, etc. > > b.) Update the list of project ideas on the main OBF GSoC page > (http://www.open-bio.org/wiki/Google_Summer_of_Code) to match. > > c.) Let me know via email that you have done so and it's ready for Google to > peruse. > > == end instructions == > > Again, please have the updates done by this Friday (March 2). The number and > quality of the project ideas are part of the evaluation process for whether > OBF is accepted as a Summer of Code organization again this year, so let's > come up with some good ones. ?:-) > > Rob Hi all, I had a quick look at the BioPerl page (partly for inspiration for Biopython projects), and it doesn't seem to have been updated since the 2011 project ideas: http://bioperl.org/wiki/Google_Summer_of_Code Some of those 2011 ideas became actual projects didn't they? If so, can any of them be continued as a new project for 2012? Over on the BioSQL mailing list I suggested a combined BioSQL+BioPerl project idea to support the SQLite schema. Does anyone fancy mentoring that? Peter From cjfields at illinois.edu Fri Mar 2 11:09:23 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 2 Mar 2012 16:09:23 +0000 Subject: [Bioperl-l] Update: call for Google Summer of Code project ideas In-Reply-To: References: <4F4BAE5B.40309@gmail.com> Message-ID: <3F440BF7-4542-44E7-947D-A77180EC54F9@illinois.edu> On Mar 2, 2012, at 7:54 AM, Peter Cock wrote: > On Mon, Feb 27, 2012 at 4:24 PM, Robert Buels wrote: >> Hi all, >> >> As kindly pointed out by Reece Hart, the previous email I sent out calling >> for Google Summer of Code project ideas, had the wrong due date for project >> ideas in it. >> >> I actually want them to all be in place by Friday, March 2, which is this >> coming Friday. >> >> == Instructions for Wiki Editing == >> >> For each of the OBF projects that wants to do GSoC again this year, please: >> >> a.) Update the list of project ideas on your project's GSoC page (BioPython, >> BioPerl, BioRuby, etc). Add new ones, remove ones that have already been >> done or no longer relevant, etc. >> >> b.) Update the list of project ideas on the main OBF GSoC page >> (http://www.open-bio.org/wiki/Google_Summer_of_Code) to match. >> >> c.) Let me know via email that you have done so and it's ready for Google to >> peruse. >> >> == end instructions == >> >> Again, please have the updates done by this Friday (March 2). The number and >> quality of the project ideas are part of the evaluation process for whether >> OBF is accepted as a Summer of Code organization again this year, so let's >> come up with some good ones. :-) >> >> Rob > > Hi all, > > I had a quick look at the BioPerl page (partly for inspiration for Biopython > projects), and it doesn't seem to have been updated since the 2011 project > ideas: > > http://bioperl.org/wiki/Google_Summer_of_Code > > Some of those 2011 ideas became actual projects didn't they? If so, can > any of them be continued as a new project for 2012? > > Over on the BioSQL mailing list I suggested a combined BioSQL+BioPerl > project idea to support the SQLite schema. Does anyone fancy mentoring > that? > > Peter Things have been rather quiet lately, I believe b/c everyone is busy with other $obligations, I myself have been ridiculously busy lately but try to pipe up when I can. A few of the projects on that page are still legitimate, though, and I think support of SQLite is a good project. I would really like to see a somewhat Bio*-agnostic ORM for BioSQL, though. I know there was an effort for a DBIx::Class-based one a few years back, not sure how far that progressed. chris From wrp at virginia.edu Fri Mar 2 13:01:09 2012 From: wrp at virginia.edu (William Pearson) Date: Fri, 2 Mar 2012 13:01:09 -0500 Subject: [Bioperl-l] Parser for PSIBLAST -out_pssm ASN.1 PSSM Message-ID: <882B7402-552F-4898-812D-B17FF42F2258@virginia.edu> Can BioPerl (or Perl) parse the PSSM produced by BLAST+ psiblast with the -out_pssm command? FASTA parse the binary ASN.1, but not the ascii text version. Bill Pearson From hlapp at drycafe.net Fri Mar 2 14:14:53 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 2 Mar 2012 14:14:53 -0500 Subject: [Bioperl-l] Update: call for Google Summer of Code project ideas In-Reply-To: <3F440BF7-4542-44E7-947D-A77180EC54F9@illinois.edu> References: <4F4BAE5B.40309@gmail.com> <3F440BF7-4542-44E7-947D-A77180EC54F9@illinois.edu> Message-ID: On Mar 2, 2012, at 11:09 AM, Fields, Christopher J wrote: > I would really like to see a somewhat Bio*-agnostic ORM for BioSQL, though. I know there was an effort for a DBIx::Class-based one a few years back, not sure how far that progressed. Yes, I'd love that too. But I won't have time to formulate anything before some time next week. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From carandraug+dev at gmail.com Sat Mar 3 19:33:30 2012 From: carandraug+dev at gmail.com (=?ISO-8859-1?Q?Carn=EB_Draug?=) Date: Sun, 4 Mar 2012 00:33:30 +0000 Subject: [Bioperl-l] Specification of features tables (on translation mixed with replace qualifiers) Message-ID: HI I have been reading the specifications for feature tables by DDBJ/EMBL/GenBank so I can get all my plasmids saved properly but I don't know how to deal with a specific situation. I have many plasmids that I mutated by site-directed mutagenesis. It seems that in this cases, I should keep the sequence of the entry intact (compared to the original), and simply use the misc_difference feature with a replace tag/quantifier (from the specifications document: "the misc_difference feature key should be used to describe variability that arises as a result of genetic manipulation (e.g. site directed mutagenesis)". What I wonder is if the translation tag in the CDS feature should be of the mutated sequence (checking for the replace tag), or the translation of the actual entry sequence? The specification does not specify such, it defines it as: "automatically generated one-letter abbreviated amino acid sequence derived from either the universal genetic code or the table as specified in /transl_table and as determined by exceptions in the /transl_except and /codon qualifiers". By the way, I was testing this with bioperl and Bio::SeqIO and here some things I found, some of them may be bugs: * when using the translate method, it translates the sequence of the entry ignoring replace tags * when saving the entry for a file, it does not check if the /translate value of the CDS feature is correct Also, is there a module that I can use to validate a sequence file? Thanks in advance, Carn? From cjfields at illinois.edu Sat Mar 3 21:48:34 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sun, 4 Mar 2012 02:48:34 +0000 Subject: [Bioperl-l] Parser for PSIBLAST -out_pssm ASN.1 PSSM In-Reply-To: <882B7402-552F-4898-812D-B17FF42F2258@virginia.edu> References: <882B7402-552F-4898-812D-B17FF42F2258@virginia.edu> Message-ID: <03A89CF4-7A03-4F26-9FC2-3960C994DAB1@illinois.edu> We do have a Bio::Matrix::IO::psiblast module in bioperl-live (and in past CPAN releases, of course) that seems to fit. Of course that is from blastpgp, not BLAST+, so this assumes the format hasn't changed. NAME Bio::Matrix::PSM::IO - PSM parser SYNOPSIS See Bio::Matrix::PSM::IO for documentation DESCRIPTION Parser for ASCII matrices from PSI-BLAST (blastpgp program in BLAST distribution). chris On Mar 2, 2012, at 12:01 PM, William Pearson wrote: > Can BioPerl (or Perl) parse the PSSM produced by BLAST+ psiblast with the -out_pssm command? > > FASTA parse the binary ASN.1, but not the ascii text version. > > Bill Pearson > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Mon Mar 5 02:39:08 2012 From: florent.angly at gmail.com (Florent Angly) Date: Mon, 05 Mar 2012 17:39:08 +1000 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method Message-ID: <4F546D9C.7080808@gmail.com> Hi all, I have just been burned by a problem with the Bio::Seq add_SeqFeature() method. Bio::Seq is a class which implements Bio::SeqI, which itselt implements Bio::FeatureHolderI, which defines an add_SeqFeature() method as: > Usage : $feat->add_SeqFeature($subfeat); > $feat->add_SeqFeature($subfeat,'EXPAND') > Function: adds a SeqFeature into the subSeqFeature array. > with no 'EXPAND' qualifer, subfeat will be tested > as to whether it lies inside the parent, and throw > an exception if not. > > If EXPAND is used, the parent''s start/end/strand will > be adjusted so that it grows to accommodate the new > subFeature > Example : > Returns : nothing > Args : a Bio::SeqFeatureI object In comparison, the add_SeqFeature method implemented by Bio::Seq is: > Title : add_SeqFeature > Usage : $seq->add_SeqFeature($feat); > $seq->add_SeqFeature(@feat); > Function: Adds the given feature object (or each of an array of feature > objects to the feature array of this > sequence. The object passed is required to implement the > Bio::SeqFeatureI interface. > Returns : 1 on success > Args : A Bio::SeqFeatureI implementing object, or an array of such > objects. As you can see, there is a discrepancy. While the Bio::Seq method takes an array of features, Bio::FeatureHolderI states that it should take a single feature and the optional 'EXPAND' scalar. It would not be very hard to modify Bio::Seq so that it complies with Bio::FeatureHolderI. One would have to make sure that the Bio::Seq feature takes the 'EXPAND' option and to have a deprecation message for any call with more than one feature. First, I am missing something here or does Bio::Seq need to comply with Bio::FeatureHolderI? Then, if Bio::Seq needs to be changed, I wanted to have some feedback from other wise Bioperl-ers to see if the course of action I described is adapted. Best, Florent From hlapp at drycafe.net Sun Mar 4 14:27:34 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Sun, 4 Mar 2012 14:27:34 -0500 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: <4F546D9C.7080808@gmail.com> References: <4F546D9C.7080808@gmail.com> Message-ID: <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> It does need to comply - the interface is the contract. That being said, any implementation can go *beyond* the contract in what it supports - it just can't fall short of it. So an implementation can always implement more than the contract requires, but never less or it is out of compliance. It seems from the documentation you quote that Bio::Seq does support a single feature to be added, so that part is fine. However, there is no mention of the EXPAND option, so if that's indeed not supported (can't look at the code right now) then it is not compliant, and that should be fixed. -hilmar Sent with a tap. On Mar 5, 2012, at 2:39 AM, Florent Angly wrote: > Hi all, > > I have just been burned by a problem with the Bio::Seq add_SeqFeature() method. Bio::Seq is a class which implements Bio::SeqI, which itselt implements Bio::FeatureHolderI, which defines an add_SeqFeature() method as: >> Usage : $feat->add_SeqFeature($subfeat); >> $feat->add_SeqFeature($subfeat,'EXPAND') >> Function: adds a SeqFeature into the subSeqFeature array. >> with no 'EXPAND' qualifer, subfeat will be tested >> as to whether it lies inside the parent, and throw >> an exception if not. >> >> If EXPAND is used, the parent''s start/end/strand will >> be adjusted so that it grows to accommodate the new >> subFeature >> Example : >> Returns : nothing >> Args : a Bio::SeqFeatureI object > > In comparison, the add_SeqFeature method implemented by Bio::Seq is: >> Title : add_SeqFeature >> Usage : $seq->add_SeqFeature($feat); >> $seq->add_SeqFeature(@feat); >> Function: Adds the given feature object (or each of an array of feature >> objects to the feature array of this >> sequence. The object passed is required to implement the >> Bio::SeqFeatureI interface. >> Returns : 1 on success >> Args : A Bio::SeqFeatureI implementing object, or an array of such objects. > > As you can see, there is a discrepancy. While the Bio::Seq method takes an array of features, Bio::FeatureHolderI states that it should take a single feature and the optional 'EXPAND' scalar. > > It would not be very hard to modify Bio::Seq so that it complies with Bio::FeatureHolderI. One would have to make sure that the Bio::Seq feature takes the 'EXPAND' option and to have a deprecation message for any call with more than one feature. > > First, I am missing something here or does Bio::Seq need to comply with Bio::FeatureHolderI? > Then, if Bio::Seq needs to be changed, I wanted to have some feedback from other wise Bioperl-ers to see if the course of action I described is adapted. > > Best, > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Mar 4 15:30:40 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Sun, 4 Mar 2012 20:30:40 +0000 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> References: <4F546D9C.7080808@gmail.com>, <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> Message-ID: <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> What do other FeatureHolderI do? What does the current set of tests expect? That would give a consensus on expected behavior, then the decision could be made. For instance, with SeqFeature and Seq the behavior is for a single feature as the arg with a n optional expansion arg in some cases. Based on that, I might argue that the interface itself wasn't updated when the implementations were set up. The name of the method seems to imply a single feature is the arg (otherwise why not call it add_SeqFeatures). Regardless, it could be fixed so that all implementations accept a list of features, which is what I think Florent means. Dealing with the additional optional 'EXPAND' Is kind of a pain, which includes a need to check type of the last arg and DTRT, but it is possible. Chris On Mar 4, 2012, at 1:28 PM, "Hilmar Lapp" wrote: > It does need to comply - the interface is the contract. That being said, any implementation can go *beyond* the contract in what it supports - it just can't fall short of it. So an implementation can always implement more than the contract requires, but never less or it is out of compliance. > > It seems from the documentation you quote that Bio::Seq does support a single feature to be added, so that part is fine. However, there is no mention of the EXPAND option, so if that's indeed not supported (can't look at the code right now) then it is not compliant, and that should be fixed. > > -hilmar > > Sent with a tap. > > On Mar 5, 2012, at 2:39 AM, Florent Angly wrote: > >> Hi all, >> >> I have just been burned by a problem with the Bio::Seq add_SeqFeature() method. Bio::Seq is a class which implements Bio::SeqI, which itselt implements Bio::FeatureHolderI, which defines an add_SeqFeature() method as: >>> Usage : $feat->add_SeqFeature($subfeat); >>> $feat->add_SeqFeature($subfeat,'EXPAND') >>> Function: adds a SeqFeature into the subSeqFeature array. >>> with no 'EXPAND' qualifer, subfeat will be tested >>> as to whether it lies inside the parent, and throw >>> an exception if not. >>> >>> If EXPAND is used, the parent''s start/end/strand will >>> be adjusted so that it grows to accommodate the new >>> subFeature >>> Example : >>> Returns : nothing >>> Args : a Bio::SeqFeatureI object >> >> In comparison, the add_SeqFeature method implemented by Bio::Seq is: >>> Title : add_SeqFeature >>> Usage : $seq->add_SeqFeature($feat); >>> $seq->add_SeqFeature(@feat); >>> Function: Adds the given feature object (or each of an array of feature >>> objects to the feature array of this >>> sequence. The object passed is required to implement the >>> Bio::SeqFeatureI interface. >>> Returns : 1 on success >>> Args : A Bio::SeqFeatureI implementing object, or an array of such objects. >> >> As you can see, there is a discrepancy. While the Bio::Seq method takes an array of features, Bio::FeatureHolderI states that it should take a single feature and the optional 'EXPAND' scalar. >> >> It would not be very hard to modify Bio::Seq so that it complies with Bio::FeatureHolderI. One would have to make sure that the Bio::Seq feature takes the 'EXPAND' option and to have a deprecation message for any call with more than one feature. >> >> First, I am missing something here or does Bio::Seq need to comply with Bio::FeatureHolderI? >> Then, if Bio::Seq needs to be changed, I wanted to have some feedback from other wise Bioperl-ers to see if the course of action I described is adapted. >> >> Best, >> >> Florent >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Sun Mar 4 16:44:41 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Sun, 4 Mar 2012 16:44:41 -0500 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> References: <4F546D9C.7080808@gmail.com>, <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> Message-ID: On Mar 4, 2012, at 3:30 PM, Fields, Christopher J wrote: > Dealing with the additional optional 'EXPAND' Is kind of a pain I know, but if we don't want it required we should take it out of the interface. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From florent.angly at gmail.com Mon Mar 5 18:04:54 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 06 Mar 2012 09:04:54 +1000 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> References: <4F546D9C.7080808@gmail.com>, <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> Message-ID: <4F554696.3090905@gmail.com> On 05/03/12 06:30, Fields, Christopher J wrote: > What do other FeatureHolderI do? Bio::SeqFeature::Generic does it as defined in Bio::FeatureHolderI, i.e.: > Usage : $feat->add_SeqFeature($subfeat); > $feat->add_SeqFeature($subfeat,'EXPAND'); Note that it does not simply accept 'EXPAND', but it actually, acts on it. Florent From florent.angly at gmail.com Mon Mar 5 18:18:17 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 06 Mar 2012 09:18:17 +1000 Subject: [Bioperl-l] Coordinates of sub SeqFeatures Message-ID: <4F5549B9.7090306@gmail.com> Hi all, Could anyone provide clarifications regarding the coordinates of sub SeqFeatures? Consider this script, that has a sequence with a feature, which itself contains a (sub) seqfeature: > #! /usr/bin/env perl > > use strict; > use warnings; > use Bio::Seq; > use Bio::SeqFeature::Generic; > > select(STDERR); > $| = 1; > select(STDOUT); > $| = 1; > > my $seq = Bio::Seq->new( > -seq => 'AAAAAAAAAAAAAAAAAAAACCCCCCTTTT', > ); > > # A feature of the sequence spanning the 'CCCCCCTTTT' region > my $feat = Bio::SeqFeature::Generic->new( > -start => 21, > -end => 30, > -strand => 1, > ); > print "Attaching feature to sequence... "; > $seq->add_SeqFeature($feat); > print "done!\n\n"; > > # A subfeature of the feature, spanning 'CCCCCC' > my $subfeat = Bio::SeqFeature::Generic->new( > -start => 1, > -end => 6, > # -start => 21, > # -end => 26, > ); > print "Attaching subfeature to feature... "; > $feat->add_SeqFeature($subfeat); > print "done!\n\n"; Here, I gave the sub SeqFeature coordinates relative to the feature it is attached to, i.e. position 1..6 of the SeqFeature. However, I get the exception "Bio::SeqFeature::Generic=HASH(0x15fa150) is not contained within parent feature, and expansion is not valid". Now, if I provide the sub SeqFeature coordinates relative to the sequence instead of the feature it is attached to, i.e. 21..26, everything goes well. Obviously, add_SeqFeature expects coordinates to be relative to the sequence. All the documentation I have looked at does not talk about what the coordinates are relative too. The problem to me is that this seems counter-intuitive to not being able to provide coordinates relative to the object a feature is attached too. Can anyone clarify how the coordinate system for features and sub features is intended to work? Thanks, Florent From cjfields at illinois.edu Sun Mar 4 21:13:38 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 5 Mar 2012 02:13:38 +0000 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: References: <4F546D9C.7080808@gmail.com>, <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> Message-ID: On Mar 4, 2012, at 3:44 PM, Hilmar Lapp wrote: > On Mar 4, 2012, at 3:30 PM, Fields, Christopher J wrote: > >> Dealing with the additional optional 'EXPAND' Is kind of a pain > > I know, but if we don't want it required we should take it out of the interface. > > -hilmar Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI. So I think the FeatureHolderI interface is really too specific in this case, and the RangeI-specific requirements (e.g. for 'EXPAND') should be relaxed or completely removed as Hilmar indicates. Implementations can freely go above and beyond what the interface requires. I think we do need some alternative method in FeatureHolderI for safely adding multiple features in one go, this could simply use the plural, e.g. add_SeqFeatures(). Re: Bio::Seq::add_SeqFeature as currently implemented, this behavior has been around for a long, long time. 'git blame' has Ewan adding this back in 2000, so I expect this to potentially break something even if it doesn't show up in tests. Doesn't mean we shouldn't fix it, just that we need to indicate the problem with it (maybe as easy as pointing to this thread), and suggest a possible alternative such as the way mentioned above. chris From cjfields at illinois.edu Sun Mar 4 21:15:59 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 5 Mar 2012 02:15:59 +0000 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: <4F554696.3090905@gmail.com> References: <4F546D9C.7080808@gmail.com>, <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> <4F554696.3090905@gmail.com> Message-ID: <34A02C73-DB36-4AAF-96DB-1E68AD4080C3@illinois.edu> On Mar 5, 2012, at 5:04 PM, Florent Angly wrote: > On 05/03/12 06:30, Fields, Christopher J wrote: >> What do other FeatureHolderI do? > > Bio::SeqFeature::Generic does it as defined in Bio::FeatureHolderI, i.e.: >> Usage : $feat->add_SeqFeature($subfeat); >> $feat->add_SeqFeature($subfeat,'EXPAND'); > > Note that it does not simply accept 'EXPAND', but it actually, acts on it. > > Florent Yes, that matches up. My guess is the interface was developed originally with this in mind, but was (ab)used for Bio::Seq w/o changing the method. chris From florent.angly at gmail.com Mon Mar 5 23:36:42 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 06 Mar 2012 14:36:42 +1000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> Message-ID: <4F55945A.7070901@gmail.com> To all interested, the AmpliconSearch module is in a decent state. If you want to test it or improve it, head to https://github.com/bioperl/bioperl-live/blob/amplicons/Bio/Tools/AmpliconSearch.pm Regards, Florent On 01/03/12 12:42, Fields, Christopher J wrote: > Florent, > > Just want to add, my previous response isn't meant as an admonishment, hope it didn't come across that way, but sometimes email makes it hard to discern the difference. I simply meant to demonstrate my opinion that I find releasing one's code is much simpler (e.g. you can decide the rules and dictate when the code is ready for release), and if we can make getting good code into user's hands easier, more flexible, and more consistent I think that is always a better path. > > chris > > On Feb 29, 2012, at 8:30 PM, Fields, Christopher J wrote: > >> There are a number of very good reasons to separate out common code and create new repos for new code. The problem about adding new code into core is it ties your code development to bioperl-live's release cycle and versioning. Also, what I (and others) would not like to see is any additional dependencies introduced, but a separate release allows you to (1) both add a dependency w/o affecting core, and (2) make it required, so no fiddling with checking for the module prior to running tests on it. >> >> As an example, I can easily see something like Bio::SearchIO::blastxml living on it's own since it has a set of outside dependencies. >> >> BTW, separation of modules into separate distributions (even single modules) based on functionality above and beyond that defined in a core is very common in the perl world. Beyond the obvious example of anything non-core in perl (all installable via CPAN), Moose, Dist::Zilla, Catalyst, Dancer, etc all have separately installable dists that layer additional functionality and have a separate maintenance path. >> >> chris >> >> On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: >> >>> Thanks for everybody's feedback. >>> >>> I am looking at existing modules to hold template sequence, amplicon sequence and primer information. There is the Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the PrimedSeq object places Primer objects on the target sequence. I have been looking at refreshing these modules (they are quite old), add some sanity to them and make sure they are suitable for a generic implementation of PCR (or amplicon search, which I find a more suitable name since it is a far cry from simulating PCR cycles, etc). >>> >>> I will make a remote branch today to make it easier for interested parties to experiment and contribute. >>> >>> As you can see Chris, the amplicon search feature would use two existing bioperl-live modules and only add one, tentatively in the Bio::Tools::AmpliconSearch namespace. I am not convinced that this warrants a separate distro. >>> >>> Florent >>> >>> On 01/03/12 01:23, Fields, Christopher J wrote: >>>> Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: >>>> >>>> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >>>> >>>> and it's not there. >>>> >>>> I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. >>>> >>>> chris >>>> >>>> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >>>> >>>>> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >>>>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>>>> >>>>> (There's supposedly a more recent version here: >>>>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>>>> but that file seems to be truncated). >>>>> >>>>> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >>>>> >>>>> Cheers, >>>>> Roy. >>>>> >>>>> >>>>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>>>> Bioperl codebase. Any idea where it went? >>>>>> No idea; I can't find it anywhere in the code base either, and the >>>>>> github repo contains history going back to the original CVS repo. >>>>>> You can try contacting the author, possibly. >>>>>> >>>>>>> The reason I am asking is because I have some code to do silico PCR >>>>>>> using regular expressions. I wanted to modularize my code more and >>>>>>> make it into a module for Bioperl. Of course, if there is something >>>>>>> similar in Bioperl already, I need to have a look at it. If there >>>>>>> is nothing similar, what namespace do you suggest to use? >>>>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>>>> Bio::Tools::InSilicoPCR? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Florent >>>>>> Maybe the last (InSilicoPCR). >>>>>> >>>>>> chris >>>>>> >>>>>> >>>>>> _______________________________________________ Bioperl-l mailing >>>>>> list Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Mar 5 20:08:57 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 6 Mar 2012 01:08:57 +0000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: <4F55945A.7070901@gmail.com> References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> <4F55945A.7070901@gmail.com> Message-ID: I'll check it out. Want me to post test results here (I have access to a few systems to test on). chris On Mar 5, 2012, at 10:36 PM, Florent Angly wrote: > To all interested, > the AmpliconSearch module is in a decent state. If you want to test it or improve it, head to https://github.com/bioperl/bioperl-live/blob/amplicons/Bio/Tools/AmpliconSearch.pm > Regards, > Florent > > > On 01/03/12 12:42, Fields, Christopher J wrote: >> Florent, >> >> Just want to add, my previous response isn't meant as an admonishment, hope it didn't come across that way, but sometimes email makes it hard to discern the difference. I simply meant to demonstrate my opinion that I find releasing one's code is much simpler (e.g. you can decide the rules and dictate when the code is ready for release), and if we can make getting good code into user's hands easier, more flexible, and more consistent I think that is always a better path. >> >> chris >> >> On Feb 29, 2012, at 8:30 PM, Fields, Christopher J wrote: >> >>> There are a number of very good reasons to separate out common code and create new repos for new code. The problem about adding new code into core is it ties your code development to bioperl-live's release cycle and versioning. Also, what I (and others) would not like to see is any additional dependencies introduced, but a separate release allows you to (1) both add a dependency w/o affecting core, and (2) make it required, so no fiddling with checking for the module prior to running tests on it. >>> >>> As an example, I can easily see something like Bio::SearchIO::blastxml living on it's own since it has a set of outside dependencies. >>> >>> BTW, separation of modules into separate distributions (even single modules) based on functionality above and beyond that defined in a core is very common in the perl world. Beyond the obvious example of anything non-core in perl (all installable via CPAN), Moose, Dist::Zilla, Catalyst, Dancer, etc all have separately installable dists that layer additional functionality and have a separate maintenance path. >>> >>> chris >>> >>> On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: >>> >>>> Thanks for everybody's feedback. >>>> >>>> I am looking at existing modules to hold template sequence, amplicon sequence and primer information. There is the Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the PrimedSeq object places Primer objects on the target sequence. I have been looking at refreshing these modules (they are quite old), add some sanity to them and make sure they are suitable for a generic implementation of PCR (or amplicon search, which I find a more suitable name since it is a far cry from simulating PCR cycles, etc). >>>> >>>> I will make a remote branch today to make it easier for interested parties to experiment and contribute. >>>> >>>> As you can see Chris, the amplicon search feature would use two existing bioperl-live modules and only add one, tentatively in the Bio::Tools::AmpliconSearch namespace. I am not convinced that this warrants a separate distro. >>>> >>>> Florent >>>> >>>> On 01/03/12 01:23, Fields, Christopher J wrote: >>>>> Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: >>>>> >>>>> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >>>>> >>>>> and it's not there. >>>>> >>>>> I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. >>>>> >>>>> chris >>>>> >>>>> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >>>>> >>>>>> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >>>>>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>>>>> >>>>>> (There's supposedly a more recent version here: >>>>>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>>>>> but that file seems to be truncated). >>>>>> >>>>>> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >>>>>> >>>>>> Cheers, >>>>>> Roy. >>>>>> >>>>>> >>>>>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>>>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>>>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>>>>> Bioperl codebase. Any idea where it went? >>>>>>> No idea; I can't find it anywhere in the code base either, and the >>>>>>> github repo contains history going back to the original CVS repo. >>>>>>> You can try contacting the author, possibly. >>>>>>> >>>>>>>> The reason I am asking is because I have some code to do silico PCR >>>>>>>> using regular expressions. I wanted to modularize my code more and >>>>>>>> make it into a module for Bioperl. Of course, if there is something >>>>>>>> similar in Bioperl already, I need to have a look at it. If there >>>>>>>> is nothing similar, what namespace do you suggest to use? >>>>>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>>>>> Bio::Tools::InSilicoPCR? >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Florent >>>>>>> Maybe the last (InSilicoPCR). >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> >>>>>>> _______________________________________________ Bioperl-l mailing >>>>>>> list Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Tue Mar 6 20:31:18 2012 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 07 Mar 2012 11:31:18 +1000 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: References: <4F546D9C.7080808@gmail.com>, <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> Message-ID: <4F56BA66.8040509@gmail.com> On 05/03/12 12:13, Fields, Christopher J wrote: > Re: Bio::Seq::add_SeqFeature as currently implemented, this behavior > has been around for a long, long time. Ok, I figured this was something like that. I think it is fine to leave this behavior in, as long as it is documented, which I have just done: https://github.com/bioperl/bioperl-live/commit/5f115e23c09c0c72ec3af2436193c0a6d60aeb54 Florent From chapmanb at 50mail.com Mon Mar 5 21:09:29 2012 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 05 Mar 2012 21:09:29 -0500 Subject: [Bioperl-l] BOSC 2012 Call for Abstracts Message-ID: <87d38q4h0m.fsf@fastmail.fm> Call for Abstracts for the 13th Annual Bioinformatics Open Source Conference (BOSC 2012) A Special Interest Group (SIG) of ISMB 2012 Dates: July 13-14, 2012 Location: Long Beach, California Web site: http://www.open-bio.org/wiki/BOSC_2012 Email: bosc at open-bio.org BOSC announcements mailing list: http://lists.open-bio.org/mailman/listinfo/bosc-announce Important Dates: April 13, 2012: Deadline for submitting abstracts May 7, 2012: Notification of accepted talk abstracts emailed to authors July 11-12, 2012: Codefest 2012 programming session July 13-14, 2012: BOSC 2012 July 15-17, 2012: ISMB 2012 The Bioinformatics Open Source Conference (BOSC) is sponsored by the Open Bioinformatics Foundation (O|B|F), a non-profit group dedicated to promoting the practice and philosophy of Open Source software development within the biological research community. To be considered for acceptance, software systems representing the central topic in a presentation submitted to BOSC must be licensed with a recognized Open Source License, and be freely available for download in source code form. We invite you to submit one-page abstracts for talks and posters. This year's session topics are: - Cloud and Parallel Computing - Linked Data - Genome-scale Data Management - Data Visualization and Imaging - Translational Bioinformatics - Software Interoperability (possibly a joint session with BSI-SIG, the Bioinformatics Software Interoperability SIG) - Bioinformatics Open Source Project Updates - Interfacing with Industry (panel) Thanks to generous sponsorship from Eagle Genomics and an anonymous donor, we are pleased to announce a competition for three Student Travel Awards. Each winner will be awarded $250 to defray the costs of travel to BOSC 2012. For instructions on submitting your abstract, please visit http://www.open-bio.org/wiki/BOSC_2012#Submitting_Abstracts BOSC 2012 Organizing Committee: Nomi Harris (chair), Jan Aerts, Brad Chapman, Peter Cock, Chris Fields, Erwin Frise, Peter Rice From scott at scottcain.net Mon Mar 5 23:49:34 2012 From: scott at scottcain.net (Scott Cain) Date: Mon, 5 Mar 2012 23:49:34 -0500 Subject: [Bioperl-l] April GMOD meeting early registration Message-ID: Hello, This is a reminder that there is only a little over two days left to get in on early registration for the April 2012 GMOD meeting. In addition to getting a $10 discount on the registration fee, you will be entered in a drawing to get a GMOD coffee cup or T-shirt. There are some very good speakers and topics lined up for the meeting; I'm looking forward to a good one. Please see: http://gmod.org/wiki/April_2012_GMOD_Meeting for more information on the meeting, and to: http://gmod2012.eventbrite.com/ to register for the meeting. I look forward to seeing you next month. Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From florent.angly at gmail.com Wed Mar 7 00:00:50 2012 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 07 Mar 2012 15:00:50 +1000 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: References: <4F546D9C.7080808@gmail.com>, <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> Message-ID: <4F56EB82.9060603@gmail.com> On 05/03/12 12:13, Fields, Christopher J wrote: > Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI. Ok, I have: 1/ clarified in Bio::FeatureHolderI that there is no guarantee that 'EXPAND' will be honored 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND' keyword (but do nothing about it) 3/ deprecated the use of passing multiple features to add_SeqFeature() in Bio::Seq 4/ updated documentation and code that relied on passing multiple features That should take care of the issue at hand. See this commit: https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8 Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store, Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite, Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an add_SeqFeature() method that accepts an array of features but they are not Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI but that's another story. However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and is not compliant. I fixed it here: https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380 Florent From florent.angly at gmail.com Wed Mar 7 00:08:24 2012 From: florent.angly at gmail.com (Florent Angly) Date: Wed, 07 Mar 2012 15:08:24 +1000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> <4F55945A.7070901@gmail.com> Message-ID: <4F56ED48.9090500@gmail.com> Yes, thanks Chris. If you want to start splitting a Bio-Tools distribution that would include Bio::Tools::AmpliconSearch, I am happy to help. In general, I am not specifically attached to the namespace, so if you guys prefer something different, just tell me. Note that AmpliconSearch uses a couple of new or re-worked objects, namely Bio::SeqFeature::Primer, Bio::SeqFeature::Amplicon and Bio::SeqFeature::SubSeq. Both Primer and Amplicons inherit from SubSeq, which inherits from Bio::SeqFeature::Generic. The main goal of SubSeq is to allow fetching a subsequence from the attached sequence or to explicitly adding a sequence that represents the feature (the Generic feature class does not support setting such a sequence, just getting the subsequence). I think it would be good to add this functionality to the Generic feature class, but I did not want to force things without asking everybody first if this seems like a good idea. Florent On 06/03/12 11:08, Fields, Christopher J wrote: > I'll check it out. Want me to post test results here (I have access to a few systems to test on). > > chris > > On Mar 5, 2012, at 10:36 PM, Florent Angly wrote: > >> To all interested, >> the AmpliconSearch module is in a decent state. If you want to test it or improve it, head to https://github.com/bioperl/bioperl-live/blob/amplicons/Bio/Tools/AmpliconSearch.pm >> Regards, >> Florent >> >> >> On 01/03/12 12:42, Fields, Christopher J wrote: >>> Florent, >>> >>> Just want to add, my previous response isn't meant as an admonishment, hope it didn't come across that way, but sometimes email makes it hard to discern the difference. I simply meant to demonstrate my opinion that I find releasing one's code is much simpler (e.g. you can decide the rules and dictate when the code is ready for release), and if we can make getting good code into user's hands easier, more flexible, and more consistent I think that is always a better path. >>> >>> chris >>> >>> On Feb 29, 2012, at 8:30 PM, Fields, Christopher J wrote: >>> >>>> There are a number of very good reasons to separate out common code and create new repos for new code. The problem about adding new code into core is it ties your code development to bioperl-live's release cycle and versioning. Also, what I (and others) would not like to see is any additional dependencies introduced, but a separate release allows you to (1) both add a dependency w/o affecting core, and (2) make it required, so no fiddling with checking for the module prior to running tests on it. >>>> >>>> As an example, I can easily see something like Bio::SearchIO::blastxml living on it's own since it has a set of outside dependencies. >>>> >>>> BTW, separation of modules into separate distributions (even single modules) based on functionality above and beyond that defined in a core is very common in the perl world. Beyond the obvious example of anything non-core in perl (all installable via CPAN), Moose, Dist::Zilla, Catalyst, Dancer, etc all have separately installable dists that layer additional functionality and have a separate maintenance path. >>>> >>>> chris >>>> >>>> On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: >>>> >>>>> Thanks for everybody's feedback. >>>>> >>>>> I am looking at existing modules to hold template sequence, amplicon sequence and primer information. There is the Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the PrimedSeq object places Primer objects on the target sequence. I have been looking at refreshing these modules (they are quite old), add some sanity to them and make sure they are suitable for a generic implementation of PCR (or amplicon search, which I find a more suitable name since it is a far cry from simulating PCR cycles, etc). >>>>> >>>>> I will make a remote branch today to make it easier for interested parties to experiment and contribute. >>>>> >>>>> As you can see Chris, the amplicon search feature would use two existing bioperl-live modules and only add one, tentatively in the Bio::Tools::AmpliconSearch namespace. I am not convinced that this warrants a separate distro. >>>>> >>>>> Florent >>>>> >>>>> On 01/03/12 01:23, Fields, Christopher J wrote: >>>>>> Seems like it was meant to be added at some point but was never committed. Definitely not in the github history for 1.3.x, this commit corresponds to the v1.3.4 tag: >>>>>> >>>>>> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >>>>>> >>>>>> and it's not there. >>>>>> >>>>>> I agree with Roy, it would be nice to somehow make this a little more generic or pluggable on how it maps primers (maybe with a default pure perl method). I also think this shouldn't be bound to bioperl-live considering our current plans, it would best happen in a separate repo. >>>>>> >>>>>> chris >>>>>> >>>>>> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >>>>>> >>>>>>> The code for Bio::Tools::PCRSimulation can be downloaded as part of this archive: >>>>>>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>>>>>> >>>>>>> (There's supposedly a more recent version here: >>>>>>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>>>>>> but that file seems to be truncated). >>>>>>> >>>>>>> I have no idea how much would be salvagable. It seems to just use index to map the primers to the sequence, I guess it would make more sense to at least give the option of something more sophisticated like Primer3, BLAST or even a short read mapper. >>>>>>> >>>>>>> Cheers, >>>>>>> Roy. >>>>>>> >>>>>>> >>>>>>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>>>>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I am interested in the Bio::Tools::PCRSimulation module. Supposedly >>>>>>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>>>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>>>>>> Bioperl codebase. Any idea where it went? >>>>>>>> No idea; I can't find it anywhere in the code base either, and the >>>>>>>> github repo contains history going back to the original CVS repo. >>>>>>>> You can try contacting the author, possibly. >>>>>>>> >>>>>>>>> The reason I am asking is because I have some code to do silico PCR >>>>>>>>> using regular expressions. I wanted to modularize my code more and >>>>>>>>> make it into a module for Bioperl. Of course, if there is something >>>>>>>>> similar in Bioperl already, I need to have a look at it. If there >>>>>>>>> is nothing similar, what namespace do you suggest to use? >>>>>>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>>>>>> Bio::Tools::InSilicoPCR? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Florent >>>>>>>> Maybe the last (InSilicoPCR). >>>>>>>> >>>>>>>> chris >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ Bioperl-l mailing >>>>>>>> list Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Tue Mar 6 05:17:22 2012 From: David.Messina at sbc.su.se (Dave Messina) Date: Tue, 6 Mar 2012 11:17:22 +0100 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: <4F56EB82.9060603@gmail.com> References: <4F546D9C.7080808@gmail.com> <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> <4F56EB82.9060603@gmail.com> Message-ID: Very nice! Thanks, Florent! Dave On Wed, Mar 7, 2012 at 06:00, Florent Angly wrote: > On 05/03/12 12:13, Fields, Christopher J wrote: > >> Actually, I misread Florent's original post, I was thinking that >> FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think >> Bio::Seq is abusing the FeatureHolderI interface, it should just be for a >> single feature (it can safely ignore the 'EXPAND' option). However, use of >> 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a >> start and end to expand), something that is not mentioned in the interface >> as a requirement and is not guaranteed, for instance Bio::Seq is not >> Bio::RangeI. >> > Ok, I have: > 1/ clarified in Bio::FeatureHolderI that there is no guarantee that > 'EXPAND' will be honored > 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND' > keyword (but do nothing about it) > 3/ deprecated the use of passing multiple features to add_SeqFeature() in > Bio::Seq > 4/ updated documentation and code that relied on passing multiple features > > That should take care of the issue at hand. See this commit: > https://github.com/bioperl/**bioperl-live/commit/** > a5bebe00c505fbf5279f5d717790ed**36eefcc2b8 > > Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store, > Bio::DB::SeqFeature::**NormalizedFeature, Bio::SeqFeature::Lite, > Bio::DB::SeqFeature, Bio::Search::Tiling::**MapTileUtils) that have an > add_SeqFeature() method that accepts an array of features but they are not > Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI > but that's another story. > > However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and > is not compliant. I fixed it here: https://github.com/bioperl/** > bioperl-live/commit/**29e0449d05f37c9c748aaaff8cfe59**6ca7c3d380 > > > Florent > ______________________________**_________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Mar 6 08:45:29 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 6 Mar 2012 13:45:29 +0000 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: References: <4F546D9C.7080808@gmail.com> <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> <4F56EB82.9060603@gmail.com> Message-ID: <2BE2DBF2-788F-4944-B4BC-A3D109AC0CD8@illinois.edu> Yeah, I think that should work. We should probably clarify the reasoning a bit in FeatureHolderI if you haven't already updated it. Might be interesting to see what breaks (if anything?) chris On Mar 6, 2012, at 4:17 AM, Dave Messina wrote: > Very nice! Thanks, Florent! > > Dave > > > > On Wed, Mar 7, 2012 at 06:00, Florent Angly wrote: > On 05/03/12 12:13, Fields, Christopher J wrote: > Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI. > Ok, I have: > 1/ clarified in Bio::FeatureHolderI that there is no guarantee that 'EXPAND' will be honored > 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND' keyword (but do nothing about it) > 3/ deprecated the use of passing multiple features to add_SeqFeature() in Bio::Seq > 4/ updated documentation and code that relied on passing multiple features > > That should take care of the issue at hand. See this commit: https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8 > > Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store, Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite, Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an add_SeqFeature() method that accepts an array of features but they are not Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI but that's another story. > > However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and is not compliant. I fixed it here: https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380 > > > Florent > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jayoung at fhcrc.org Tue Mar 6 13:55:26 2012 From: jayoung at fhcrc.org (Janet Young) Date: Tue, 6 Mar 2012 18:55:26 +0000 (UTC) Subject: [Bioperl-l] Saving Codeml Output file References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu><4DF56976.8080704@upvnet.upv.es><9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> <1314716331.4e5cfaab4958e@webmail.upv.es> Message-ID: Lorenzo Carretero Paulet upvnet.upv.es> writes: > > Thanks Jason, > Ok, I see. That's what I was triying at the beggining. This runs OK in my > scripts for > branch-specific models. However, when I try branch-site models (NSsites > 0) > and try to parse the results using > my $model_result= $paml_result->get_NSSite_results > I start to have problems. > > That's why I was trying to save the mlc output file and parse it, instead of > parsing directly the Bio::Tools::Phylo::PAML object. > Hi Lorenzo, I just ran into a similar issue (trying to save the output files for debugging purposes) and found your email thread. I managed to retrieve the output files by adding this code (after the "$codeml_factory->run" step). my $tempdir = $codeml_factory->tempdir; system("cp $tempdir/* ."); Maybe that'll help me debug some issues I'm having with parsing a minority of my paml runs - we'll see. It might be beyond my skills. Janet From open-bio at wvr7.me.uk Tue Mar 6 14:31:04 2012 From: open-bio at wvr7.me.uk (Giles Weaver) Date: Tue, 06 Mar 2012 19:31:04 +0000 Subject: [Bioperl-l] Job opportunity: Head of Bioinformatics at Institute for Animal Health (Surrey, UK) Message-ID: Dear All. Please pass the following onto anyone who may be interested. Note the closing date is the 16th March (next Friday!). For a pretty version of the advert without mangled formatting please see http://www.jobs.ac.uk/job/ADZ114/head-of-bioinformatics/. Thanks, Giles HEAD OF BIOINFORMATICS DRIVE AND SUPPORT QUANTITATIVE RESEARCH INTO THE VIRAL DISEASES OF ANIMALS ?42,769-?47,521 Ref: IRC43544 BASED: INSTITUTE FOR ANIMAL HEALTH, PIRBRIGHT LABORATORY, SURREY Leading the bioinformatics team, you will provide support to to IAH scientists involved in quantitative biology, but will also have the opportunity to pursue your own research. Areas of current interest include modelling of virus evolution and host immune responses using next-generation sequencing data; _in silico_ analysis of host genetics and genomics data; and learning and predicting networks of biomolecular interactions from post-genomic data sets. In this high-profile role, you'll be expected to seek funds for new projects and continue your excellent track record of publication. Building collaborative research links with other members of IAH is encouraged. Holding a PhD or equivalent in a relevant branch of the biosciences, you will have experience in a recognised R&D environment. The ability to develop and manage relational databases is essential, so we would expect proficiency in MySQL (or similar), languages such as Perl or Python, and familiarity with R, Bioconductor or another statistical program. Experience of writing grant applications and managing staff would be helpful. The Institute for Animal Health (IAH) is an institute of the Biotechnology and Biological Sciences Research Council (BBSRC). We work to enhance the UK's capability to contain, control, and eliminate viral diseases in animals through highly innovative fundamental and applied bioscience. Informal enquiries about the post can be made to Simon Gubbins, Head of Mathematical Biology (simon.gubbins at iah.ac.uk [1]) APPLICATIONS ARE HANDLED BY THE RCUK SHARED SERVICES CENTRE; TO APPLY PLEASE VISIT OUR JOB BOARD AT HTTPS://EXT.SSC.RCUK.AC.UK [2] AND COMPLETE AN ONLINE APPLICATION FORM. APPLICANTS WHO WOULD LIKE TO RECEIVE THIS ADVERT IN AN ALTERNATIVE FORMAT (E.G. LARGE PRINT, BRAILLE, AUDIO OR HARD COPY), OR WHO ARE UNABLE TO APPLY ONLINE SHOULD CONTACT US BY TELEPHONE ON 01793 867003, PLEASE QUOTE REFERENCE NUMBER IRC43544. FOR MORE INFORMATION ABOUT THE IAH GO TO CLOSING DATE: 16TH MARCH 2012. Links: ------ [1] mailto:simon.gubbins at iah.ac.uk [2] https://ext.ssc.rcuk.ac.uk/ From jason.stajich at gmail.com Tue Mar 6 20:15:54 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 7 Mar 2012 09:15:54 +0800 Subject: [Bioperl-l] Saving Codeml Output file In-Reply-To: References: <9645AF32-5EC3-41FA-9A32-45B6B92E31FD@illinois.edu><4DF56976.8080704@upvnet.upv.es><9866C4A4-AC36-4A25-B38F-3006A7BB0F11@illinois.edu> <1314636433.4e5bc291a40c6@webmail.upv.es> <1A4207F8295607498283FE9E93B775B407CCB2D3@EX02.asurite.ad.asu.edu> <1314659986.4e5c1e9268078@webmail.upv.es> <1314716331.4e5cfaab4958e@webmail.upv.es> Message-ID: <325B6DA3-7732-4F33-8BD2-F89503DFAA42@gmail.com> File::Copy module cp() works too instead of using a system call which will be more OS independent. If you don't want the temp files cleaned up you can set the save_tempfiles to true, e.g. $obj->save_tempfiles(1) before your call run(). and also have the program report the tempdir like print $obj->tempdir; See Bio::Tools::Run::WrapperBase for the other inherited methods for the Run classes. Dave Messina has debugged some of PAML4 recently, the problem is that there is constant changes in the output format of the mlc and other report flies making it difficult to insure this constantly works. It is a reason we should try and get HyPhy support working as that will allow for probably better standardized running and output. Jason On Mar 7, 2012, at 2:55 AM, Janet Young wrote: > Lorenzo Carretero Paulet upvnet.upv.es> writes: > >> >> Thanks Jason, >> Ok, I see. That's what I was triying at the beggining. This runs OK in my >> scripts for >> branch-specific models. However, when I try branch-site models (NSsites > 0) >> and try to parse the results using >> my $model_result= $paml_result->get_NSSite_results >> I start to have problems. >> >> That's why I was trying to save the mlc output file and parse it, instead of >> parsing directly the Bio::Tools::Phylo::PAML object. >> > > > Hi Lorenzo, > > I just ran into a similar issue (trying to save the output files > for debugging purposes) and found your email thread. I managed > to retrieve the output files by adding this code (after the > "$codeml_factory->run" step). > > my $tempdir = $codeml_factory->tempdir; > system("cp $tempdir/* ."); > > Maybe that'll help me debug some issues I'm having with parsing a minority of > my paml runs - we'll see. It might be beyond my skills. > > Janet > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From hlapp at drycafe.net Tue Mar 6 23:51:49 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 6 Mar 2012 23:51:49 -0500 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: <4F56EB82.9060603@gmail.com> References: <4F546D9C.7080808@gmail.com>, <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> <4F56EB82.9060603@gmail.com> Message-ID: Accepting an array is not in violation so long as the method also accepts a single feature (i.e., a single object ref), and so long as passing an array isn't going to choke if it is followed by 'EXPAND'. I wouldn't deprecate this otherwise. -hilmar On Mar 7, 2012, at 12:00 AM, Florent Angly wrote: > On 05/03/12 12:13, Fields, Christopher J wrote: >> Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI. > Ok, I have: > 1/ clarified in Bio::FeatureHolderI that there is no guarantee that 'EXPAND' will be honored > 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND' keyword (but do nothing about it) > 3/ deprecated the use of passing multiple features to add_SeqFeature() in Bio::Seq > 4/ updated documentation and code that relied on passing multiple features > > That should take care of the issue at hand. See this commit: https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8 > > Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store, Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite, Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an add_SeqFeature() method that accepts an array of features but they are not Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI but that's another story. > > However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and is not compliant. I fixed it here: https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380 > > Florent -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From cjfields at illinois.edu Wed Mar 7 00:12:00 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 7 Mar 2012 05:12:00 +0000 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: References: <4F546D9C.7080808@gmail.com>, <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> <4F56EB82.9060603@gmail.com> Message-ID: <1665C159-F222-49C3-817D-6CE4AC852CE3@illinois.edu> Beyond the grammatical incorrectness of passing multiple features to a method named add_SeqFeature(), I'm fairly neutral on it, actually, as long as the behavior is (1) consistent and (2) documented (seems the interface itself would need some clarification with that in mind). chris On Mar 6, 2012, at 10:51 PM, Hilmar Lapp wrote: > Accepting an array is not in violation so long as the method also accepts a single feature (i.e., a single object ref), and so long as passing an array isn't going to choke if it is followed by 'EXPAND'. I wouldn't deprecate this otherwise. > > -hilmar > > On Mar 7, 2012, at 12:00 AM, Florent Angly wrote: > >> On 05/03/12 12:13, Fields, Christopher J wrote: >>> Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI. >> Ok, I have: >> 1/ clarified in Bio::FeatureHolderI that there is no guarantee that 'EXPAND' will be honored >> 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND' keyword (but do nothing about it) >> 3/ deprecated the use of passing multiple features to add_SeqFeature() in Bio::Seq >> 4/ updated documentation and code that relied on passing multiple features >> >> That should take care of the issue at hand. See this commit: https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8 >> >> Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store, Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite, Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an add_SeqFeature() method that accepts an array of features but they are not Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI but that's another story. >> >> However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and is not compliant. I fixed it here: https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380 >> >> Florent > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : > =========================================================== > > > > From manchunjohn-ma at uiowa.edu Wed Mar 7 17:07:50 2012 From: manchunjohn-ma at uiowa.edu (Ma, Man Chun John) Date: Wed, 7 Mar 2012 22:07:50 +0000 Subject: [Bioperl-l] How complete is the bioperl-pedigree? Message-ID: <344D48F6FA61134A9B17AE445882A1950CA072A6@HC-MAILBOXC1-N5.healthcare.uiowa.edu> Hi all, I tried to use bio-pedigree to output some genotype files to PED files for Haploview. While the script itself was easy to write, I have to wonder--how complete is bioperl-pedigree? What I have found up to this point included: 1. There's no Pedigree::Genotype classes when it should (if it's based on PopGen); 2. Nearly all methods in Pedigree::GroupI were never implemented, even there're other methods that would call them (PedIO classes). 3. Probably because of (2), adding a Pedigree::Group object into Pedigree::Pedigree object does not import the Group's markers. 4. I used Pedigree::PedIO::linkage, and there are even syntax errors in it (Line 281: there are one more sprintf FORMAT than LIST). PopGen modules in core did not appear to be able to output to pedigree files like this. Are there any alternatives that I can use? PED files are complicated enough that even I know how to code for it, I don't really feel like coding for that. Thanks for you guys! Cheers, John MC Ma Graduate Assistant Kwitek Lab Department of Pharmacology 3125E MERF 375 Newton Road Iowa City IA 52242 ________________________________ Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited. Please reply to the sender that you have received the message in error, then delete it. Thank you. ________________________________ From awitney at sgul.ac.uk Thu Mar 8 11:39:34 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 8 Mar 2012 16:39:34 +0000 Subject: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin Message-ID: <4E43EB8C-F8C9-41D7-809F-9AE19E3DD0AC@sgul.ac.uk> Hi, I have been using Bio::Tools::Glimmer and have come across a problem with it not handling genes that wraparound across the origin. I think I have boiled it down to this test case of what happens internally with Bio::Tools::Glimmer ############################################################## #! /usr/local/bin/perl -w use strict; use warnings; use Bio::Factory::FTLocationFactory; use Bio::SeqFeature::Generic; my $location_string = 'join(117..1,135690..135187)'; my $location_factory = Bio::Factory::FTLocationFactory->new(); my $location_object = $location_factory->from_string($location_string); print "Location: ".$location_object->to_FTstring."\n"; my $gene = Bio::SeqFeature::Generic->new( '-seq_id' => 'Testing', '-location' => $location_object, '-strand' => -1 ); print "Location: ".$location_object->to_FTstring."\n"; ############################################################## $ perl ../FTLocationTest.pl Location: complement(join(135187..135690,1..117)) Location: complement(join(1..117,135187..135690)) This happens because by setting the '-strand' in Bio::SeqFeature::Generic, this calls the strand method in $location_object (Bio::Location::Split) which then causes the problem (although I can't quite work out where...! Is this intended behaviour? Thanks Adam From fossandonc at hotmail.com Thu Mar 8 15:37:05 2012 From: fossandonc at hotmail.com (=?iso-8859-1?Q?Francisco_J._Ossand=F3n?=) Date: Thu, 8 Mar 2012 17:37:05 -0300 Subject: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin In-Reply-To: <4E43EB8C-F8C9-41D7-809F-9AE19E3DD0AC@sgul.ac.uk> References: <4E43EB8C-F8C9-41D7-809F-9AE19E3DD0AC@sgul.ac.uk> Message-ID: Long ago, Bio::SeqIO had an issue with genes that were split at the origin. This was because Bioperl, when reading a Genbank file, automatically sorted the segments coordinates of the genes that was reading (not considering the possibility in a circular genome, where the sequence could go from the 1st nucleotide directly to the last one), so an extra "-nosort" argument was necessary every time to avoid Bioperl giving the wrong sequence: my $ nt_seq_obj = $feat->spliced_seq(-nosort => 1); This is the same bug. In that case the code was changed so the "-nosort" were applied based on the status of "is_circular" of the genome, see here: https://redmine.open-bio.org/issues/2579 Since your code don't have the "is_circular" information (because it don?t come from a file), I guess that the autosorting is kicking in. I think it would be better if all the "autosorting" of the sublocations array inside "Bio::Location::Split" were optional instead of automatic, because of these cases. Cheers, Francisco J. Ossandon -----Mensaje original----- De: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Adam Witney Enviado el: jueves, 08 de marzo de 2012 13:40 Para: bioperl-l at bioperl.org Asunto: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin Hi, I have been using Bio::Tools::Glimmer and have come across a problem with it not handling genes that wraparound across the origin. I think I have boiled it down to this test case of what happens internally with Bio::Tools::Glimmer ############################################################## #! /usr/local/bin/perl -w use strict; use warnings; use Bio::Factory::FTLocationFactory; use Bio::SeqFeature::Generic; my $location_string = 'join(117..1,135690..135187)'; my $location_factory = Bio::Factory::FTLocationFactory->new(); my $location_object = $location_factory->from_string($location_string); print "Location: ".$location_object->to_FTstring."\n"; my $gene = Bio::SeqFeature::Generic->new( '-seq_id' => 'Testing', '-location' => $location_object, '-strand' => -1 ); print "Location: ".$location_object->to_FTstring."\n"; ############################################################## $ perl ../FTLocationTest.pl Location: complement(join(135187..135690,1..117)) Location: complement(join(1..117,135187..135690)) This happens because by setting the '-strand' in Bio::SeqFeature::Generic, this calls the strand method in $location_object (Bio::Location::Split) which then causes the problem (although I can't quite work out where...! Is this intended behaviour? Thanks Adam _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Mar 8 16:38:07 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 8 Mar 2012 21:38:07 +0000 Subject: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin In-Reply-To: References: <4E43EB8C-F8C9-41D7-809F-9AE19E3DD0AC@sgul.ac.uk> Message-ID: <84BCE2A9-ED16-4610-B8D2-BCE12A7DB91B@illinois.edu> I agree, no sorting should be implied (a 'join' order should be based on order of addition alone, and sort should be optional). IIRC there were backwards-compat problems switching this due to reliance on old behavior, but it might be worth trying to see if anything breaks test-wise (and why it breaks). chris On Mar 8, 2012, at 2:37 PM, Francisco J. Ossand?n wrote: > Long ago, Bio::SeqIO had an issue with genes that were split at the origin. > This was because Bioperl, when reading a Genbank file, automatically sorted > the segments coordinates of the genes that was reading (not considering the > possibility in a circular genome, where the sequence could go from the 1st > nucleotide directly to the last one), so an extra "-nosort" argument was > necessary every time to avoid Bioperl giving the wrong sequence: > > my $ nt_seq_obj = $feat->spliced_seq(-nosort => 1); > > This is the same bug. In that case the code was changed so the "-nosort" > were applied based on the status of "is_circular" of the genome, see here: > https://redmine.open-bio.org/issues/2579 > > Since your code don't have the "is_circular" information (because it don't > come from a file), I guess that the autosorting is kicking in. I think it > would be better if all the "autosorting" of the sublocations array inside > "Bio::Location::Split" were optional instead of automatic, because of these > cases. > > Cheers, > > Francisco J. Ossandon > > -----Mensaje original----- > De: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Adam Witney > Enviado el: jueves, 08 de marzo de 2012 13:40 > Para: bioperl-l at bioperl.org > Asunto: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin > > Hi, > > I have been using Bio::Tools::Glimmer and have come across a problem with it > not handling genes that wraparound across the origin. I think I have boiled > it down to this test case of what happens internally with > Bio::Tools::Glimmer > > ############################################################## > #! /usr/local/bin/perl -w > > use strict; > use warnings; > > use Bio::Factory::FTLocationFactory; > use Bio::SeqFeature::Generic; > > my $location_string = 'join(117..1,135690..135187)'; > > my $location_factory = Bio::Factory::FTLocationFactory->new(); > my $location_object = $location_factory->from_string($location_string); > > print "Location: ".$location_object->to_FTstring."\n"; > > my $gene = Bio::SeqFeature::Generic->new( > '-seq_id' => 'Testing', > '-location' => $location_object, > '-strand' => -1 > ); > > print "Location: ".$location_object->to_FTstring."\n"; > > ############################################################## > > $ perl ../FTLocationTest.pl > Location: complement(join(135187..135690,1..117)) > Location: complement(join(1..117,135187..135690)) > > This happens because by setting the '-strand' in Bio::SeqFeature::Generic, > this calls the strand method in $location_object (Bio::Location::Split) > which then causes the problem (although I can't quite work out where...! > > Is this intended behaviour? > > Thanks > > Adam > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From awitney at sgul.ac.uk Fri Mar 9 06:02:06 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Fri, 9 Mar 2012 11:02:06 +0000 Subject: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin In-Reply-To: <84BCE2A9-ED16-4610-B8D2-BCE12A7DB91B@illinois.edu> References: <4E43EB8C-F8C9-41D7-809F-9AE19E3DD0AC@sgul.ac.uk> <84BCE2A9-ED16-4610-B8D2-BCE12A7DB91B@illinois.edu> Message-ID: <8936622A-F8CA-47B8-BFD4-DFC981503BD8@sgul.ac.uk> Thanks for your replies. After a little more digging there seems to be two things. In the test code i posted, the extra strand attribute in Bio::SeqFeature::Generic (and subsequent call to location_object->strand) changes the internal guide_strand such that this line in Bio::Location::Split->to_FTstring changes the order of subLocations: my @locs = ($stype eq 'join' && (!$guide && $strand == -1)) ? reverse $self->sub_Location() : $self->sub_Location() ; The second problem is probably due to the sorting order within Bio::SeqFeature::Generic when calling 'start' and 'end' then it doesn't do the right thing in this case. But I am not quite sure how to fix this. Thanks again Adam On 8 Mar 2012, at 21:38, Fields, Christopher J wrote: > I agree, no sorting should be implied (a 'join' order should be based on order of addition alone, and sort should be optional). IIRC there were backwards-compat problems switching this due to reliance on old behavior, but it might be worth trying to see if anything breaks test-wise (and why it breaks). > > chris > > On Mar 8, 2012, at 2:37 PM, Francisco J. Ossand?n wrote: > >> Long ago, Bio::SeqIO had an issue with genes that were split at the origin. >> This was because Bioperl, when reading a Genbank file, automatically sorted >> the segments coordinates of the genes that was reading (not considering the >> possibility in a circular genome, where the sequence could go from the 1st >> nucleotide directly to the last one), so an extra "-nosort" argument was >> necessary every time to avoid Bioperl giving the wrong sequence: >> >> my $ nt_seq_obj = $feat->spliced_seq(-nosort => 1); >> >> This is the same bug. In that case the code was changed so the "-nosort" >> were applied based on the status of "is_circular" of the genome, see here: >> https://redmine.open-bio.org/issues/2579 >> >> Since your code don't have the "is_circular" information (because it don't >> come from a file), I guess that the autosorting is kicking in. I think it >> would be better if all the "autosorting" of the sublocations array inside >> "Bio::Location::Split" were optional instead of automatic, because of these >> cases. >> >> Cheers, >> >> Francisco J. Ossandon >> >> -----Mensaje original----- >> De: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Adam Witney >> Enviado el: jueves, 08 de marzo de 2012 13:40 >> Para: bioperl-l at bioperl.org >> Asunto: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin >> >> Hi, >> >> I have been using Bio::Tools::Glimmer and have come across a problem with it >> not handling genes that wraparound across the origin. I think I have boiled >> it down to this test case of what happens internally with >> Bio::Tools::Glimmer >> >> ############################################################## >> #! /usr/local/bin/perl -w >> >> use strict; >> use warnings; >> >> use Bio::Factory::FTLocationFactory; >> use Bio::SeqFeature::Generic; >> >> my $location_string = 'join(117..1,135690..135187)'; >> >> my $location_factory = Bio::Factory::FTLocationFactory->new(); >> my $location_object = $location_factory->from_string($location_string); >> >> print "Location: ".$location_object->to_FTstring."\n"; >> >> my $gene = Bio::SeqFeature::Generic->new( >> '-seq_id' => 'Testing', >> '-location' => $location_object, >> '-strand' => -1 >> ); >> >> print "Location: ".$location_object->to_FTstring."\n"; >> >> ############################################################## >> >> $ perl ../FTLocationTest.pl >> Location: complement(join(135187..135690,1..117)) >> Location: complement(join(1..117,135187..135690)) >> >> This happens because by setting the '-strand' in Bio::SeqFeature::Generic, >> this calls the strand method in $location_object (Bio::Location::Split) >> which then causes the problem (although I can't quite work out where...! >> >> Is this intended behaviour? >> >> Thanks >> >> Adam >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From arnaud.mounier at dijon.inra.fr Fri Mar 9 07:42:16 2012 From: arnaud.mounier at dijon.inra.fr (Arnaud Mounier) Date: Fri, 09 Mar 2012 13:42:16 +0100 Subject: [Bioperl-l] How to make a Bio::DB::SeqI from a Bio::SeqIO Message-ID: <4F59FAA8.60202@dijon.inra.fr> Hi to everyone, let's start to introduce myself : I'm a biological data mining engineer. Formally I'm a computer scientist and I work in a biologist lab in France. Kind of a rookie in biological computing. My fist issue in this lab is to write a BioPerl script for TAIR interrogations and I'm almost done but the final step is too high. On one side there is 2 annotations files directly downloaded from TAIR or get from the curator : ATH_GO_GOSLIM.txt and gene_annotation.tair. Handling the first file is the big issue for me (already told, rookie ;) ), I read it with a Bio::SeqIO object with the table format like this : $TAIR_annotation_collection = Bio::SeqIO->new( -file => $file, -format => 'table', -delim => "\t", -display_id => 1, -accession_number => 0, -annotation_map => @mytags ); where mytags looks like : @mytags=qw(locusName tairAccession objectName relationType goTerm goID tairKeywordId aspect goSlimTerm evidenceCode evidenceDescription evidenceWith tairPublicationID annotator dateAnnotated); the name of the 15 tags comes from the ATH_GO_README.txt The field -accession_number is correctly read from the file. So I have two questions : - Is an array a correct the perl type for the annotation_map field ? I can't find a complete description in my documentation. - Once I have my Bio::SeqIO::Table handle the ATH_GO_GOSLIM.txt correctly, I want to transform it in a Bio::DB:SeqI object. More generally a BioPerl DB directly get from the Bio::SeqIO::Table object is needed. The accession number has to be the index (I can't built a BioSQL base). I can't find a suitable path to go throw this. Have you any suggestions or links ? Thank's for your help, best regards, Arnaud. -- ? It occurs to me that our survival may depend upon our talking to one another. ? Dan Simmons. Arnaud Mounier INRA - UMR 1347 Agroecologie CNRS - ERL 6300 IPM (Plant-Microorganism Interaction) 17, rue Sully - BP 86510 - F-21065 Dijon Cedex - France Work phone : +33 380 693 167 - Fax : +33 380 693 753 From jovel_juan at hotmail.com Fri Mar 9 12:24:37 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Fri, 9 Mar 2012 17:24:37 +0000 Subject: [Bioperl-l] Sorry in advance, not exactly a BIOPERL question, but sure you know about it... In-Reply-To: <4F554696.3090905@gmail.com> References: <4F546D9C.7080808@gmail.com>, , <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net>, <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu>, <4F554696.3090905@gmail.com> Message-ID: Hello All! No clear to me what generally speaking is the advantage of filtering out reads when we are interested in de novo assembly of specific taxonomic groups (i.e. bacteria, viruses, fungi, etc). More specifically, my questions are: 1. In a metagenomics library, if I am interested in de novo assembly of virus genomes, should I remove bacterial and human reads (that's what I do), and leave in phages and known viral sequences. 2. Or should I remove everything what is known and leave only "unknown" reads for assembly?? What are the advantages and pitfalls of each scenario? Thinking in terms of de Bruijn graphs... Any comment would be highly appreciated and my apologies again. Cheers, JUAN From p.j.a.cock at googlemail.com Fri Mar 9 12:33:50 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 9 Mar 2012 17:33:50 +0000 Subject: [Bioperl-l] Sorry in advance, not exactly a BIOPERL question, but sure you know about it... In-Reply-To: References: <4F546D9C.7080808@gmail.com> <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> <4F554696.3090905@gmail.com> Message-ID: On Fri, Mar 9, 2012 at 5:24 PM, Juan Jovel wrote: > > Hello All! > No clear to me what generally speaking is the advantage of filtering out reads > when we are interested in de novo assembly of specific taxonomic groups > (i.e. bacteria, viruses, fungi, etc). More specifically, my questions are: > 1. In a metagenomics library, if I am interested in de novo assembly of virus > genomes, should I remove bacterial and human reads (that's what I do), and > leave in phages and known viral sequences. That might work - but you may remove virus reads mapping onto integrated prophage inside the bacterial etc references you use. Be careful. Peter From yang.liu0508 at gmail.com Fri Mar 9 14:25:50 2012 From: yang.liu0508 at gmail.com (yang liu) Date: Fri, 9 Mar 2012 14:25:50 -0500 Subject: [Bioperl-l] modify sequence name Message-ID: Dear colleagues, When I do Sanger sequencing, I get hundreds of sequences named by DNA Numbers, and for several genes. I need to add taxon name manually for each sequence. I wonder is there a way to change the names automatically? I have two .txt files. file 1, with seqeucens named by DNA Number: >2863 AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC >2864 AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT ........ file 2, with DNA Number and taxa names, seperated by tabs 2863 Gelidium 2864 Poa ........ I hope the final file to be like this, >Gelidium-2863 AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC >Poa-2864 AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT Any ideas? Anything help would be appreciated. Yang. From cjfields at illinois.edu Fri Mar 9 15:42:38 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 9 Mar 2012 20:42:38 +0000 Subject: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin In-Reply-To: <8936622A-F8CA-47B8-BFD4-DFC981503BD8@sgul.ac.uk> References: <4E43EB8C-F8C9-41D7-809F-9AE19E3DD0AC@sgul.ac.uk> <84BCE2A9-ED16-4610-B8D2-BCE12A7DB91B@illinois.edu> <8936622A-F8CA-47B8-BFD4-DFC981503BD8@sgul.ac.uk> Message-ID: <23D84053-4132-47B0-9A3C-08D67D540AE4@illinois.edu> The best way to address this is to propose a set of tests that demonstrates the problem, but I believe you have something that should work: my $location_string = 'join(117..1,135690..135187)'; We probably should cover both aspects, a join that spans the origin on both forward and reverse strands, something like: join(117..1,135690..135187) complement(join(117..1,135690..135187)) chris On Mar 9, 2012, at 5:02 AM, Adam Witney wrote: > > Thanks for your replies. > > After a little more digging there seems to be two things. In the test code i posted, the extra strand attribute in Bio::SeqFeature::Generic (and subsequent call to location_object->strand) changes the internal guide_strand such that this line in Bio::Location::Split->to_FTstring changes the order of subLocations: > > my @locs = ($stype eq 'join' && (!$guide && $strand == -1)) ? > reverse $self->sub_Location() : $self->sub_Location() ; > > The second problem is probably due to the sorting order within Bio::SeqFeature::Generic when calling 'start' and 'end' then it doesn't do the right thing in this case. But I am not quite sure how to fix this. > > Thanks again > > Adam > > On 8 Mar 2012, at 21:38, Fields, Christopher J wrote: > >> I agree, no sorting should be implied (a 'join' order should be based on order of addition alone, and sort should be optional). IIRC there were backwards-compat problems switching this due to reliance on old behavior, but it might be worth trying to see if anything breaks test-wise (and why it breaks). >> >> chris >> >> On Mar 8, 2012, at 2:37 PM, Francisco J. Ossand?n wrote: >> >>> Long ago, Bio::SeqIO had an issue with genes that were split at the origin. >>> This was because Bioperl, when reading a Genbank file, automatically sorted >>> the segments coordinates of the genes that was reading (not considering the >>> possibility in a circular genome, where the sequence could go from the 1st >>> nucleotide directly to the last one), so an extra "-nosort" argument was >>> necessary every time to avoid Bioperl giving the wrong sequence: >>> >>> my $ nt_seq_obj = $feat->spliced_seq(-nosort => 1); >>> >>> This is the same bug. In that case the code was changed so the "-nosort" >>> were applied based on the status of "is_circular" of the genome, see here: >>> https://redmine.open-bio.org/issues/2579 >>> >>> Since your code don't have the "is_circular" information (because it don't >>> come from a file), I guess that the autosorting is kicking in. I think it >>> would be better if all the "autosorting" of the sublocations array inside >>> "Bio::Location::Split" were optional instead of automatic, because of these >>> cases. >>> >>> Cheers, >>> >>> Francisco J. Ossandon >>> >>> -----Mensaje original----- >>> De: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Adam Witney >>> Enviado el: jueves, 08 de marzo de 2012 13:40 >>> Para: bioperl-l at bioperl.org >>> Asunto: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin >>> >>> Hi, >>> >>> I have been using Bio::Tools::Glimmer and have come across a problem with it >>> not handling genes that wraparound across the origin. I think I have boiled >>> it down to this test case of what happens internally with >>> Bio::Tools::Glimmer >>> >>> ############################################################## >>> #! /usr/local/bin/perl -w >>> >>> use strict; >>> use warnings; >>> >>> use Bio::Factory::FTLocationFactory; >>> use Bio::SeqFeature::Generic; >>> >>> my $location_string = 'join(117..1,135690..135187)'; >>> >>> my $location_factory = Bio::Factory::FTLocationFactory->new(); >>> my $location_object = $location_factory->from_string($location_string); >>> >>> print "Location: ".$location_object->to_FTstring."\n"; >>> >>> my $gene = Bio::SeqFeature::Generic->new( >>> '-seq_id' => 'Testing', >>> '-location' => $location_object, >>> '-strand' => -1 >>> ); >>> >>> print "Location: ".$location_object->to_FTstring."\n"; >>> >>> ############################################################## >>> >>> $ perl ../FTLocationTest.pl >>> Location: complement(join(135187..135690,1..117)) >>> Location: complement(join(1..117,135187..135690)) >>> >>> This happens because by setting the '-strand' in Bio::SeqFeature::Generic, >>> this calls the strand method in $location_object (Bio::Location::Split) >>> which then causes the problem (although I can't quite work out where...! >>> >>> Is this intended behaviour? >>> >>> Thanks >>> >>> Adam >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From thomas.sharpton at gmail.com Fri Mar 9 16:49:04 2012 From: thomas.sharpton at gmail.com (Thomas Sharpton) Date: Fri, 9 Mar 2012 13:49:04 -0800 Subject: [Bioperl-l] Sorry in advance, not exactly a BIOPERL question, but sure you know about it... In-Reply-To: References: <4F546D9C.7080808@gmail.com> <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> <4F554696.3090905@gmail.com> Message-ID: Hi Juan, If I understand you correctly, you want to assemble viral genomes from metagenomic reads. While assembly of metagenomic data can be straightforward in some situations (e.g., low complexity communities), it is generally difficult and can often result in chimeras. By mapping sequences to reference genomes (i.e., fragment recruitment), you can effectively reduce the complexity of the community in silico and subsequently reduce the possibility of chimeric errors. That said, reference genomes frequently represent a relatively small subset of the total diversity in a community, so you might have to adopt liberal mapping parameters if you want to minimized the amount of bacterial, archaeal and eukaryotic DNA in your metagenome. This could, of course, result in the spurious filtering of viral reads that happen to share some similiarity with a reference genome. Personally, I would prefer to lose some of the viral reads and produce incomplete assemblies if I was confident that it would decrease the chance of chimeric assemblies. Then again, I personally try to avoid assembly from metagenomic data when possible, so I may be lending biased advice. The DeRisi lab has done some great work on the subject of viral genome assembly from metagenomic data. I recommend you take a look at PRICE, which you can download from the link below: http://derisilab.ucsf.edu/ I'm not sure I specifically answered your question. I do hope this helps and would be happy to talk more, if you like. But I'm certainly no metagenomics assembly expert. And we might move this conversation off the list given that it isn't quite on topic. Good luck, Tom On Fri, Mar 9, 2012 at 9:33 AM, Peter Cock wrote: > On Fri, Mar 9, 2012 at 5:24 PM, Juan Jovel wrote: > > > > Hello All! > > No clear to me what generally speaking is the advantage of filtering out > reads > > when we are interested in de novo assembly of specific taxonomic groups > > (i.e. bacteria, viruses, fungi, etc). More specifically, my questions > are: > > 1. In a metagenomics library, if I am interested in de novo assembly of > virus > > genomes, should I remove bacterial and human reads (that's what I do), > and > > leave in phages and known viral sequences. > > That might work - but you may remove virus reads mapping onto integrated > prophage inside the bacterial etc references you use. Be careful. > > Peter > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From huangyifeicmb at gmail.com Fri Mar 9 16:50:09 2012 From: huangyifeicmb at gmail.com (Yifei Huang) Date: Fri, 9 Mar 2012 16:50:09 -0500 Subject: [Bioperl-l] modify sequence name In-Reply-To: References: Message-ID: Hi Yang, It is fairly easy to do that in perl. You may write a perl script like this: Step 1: read file 2 line by line and use the function 'split' to separate sequence Ids and taxon names. Then construct a hash table in which keys are sequence ids and values are taxon names. Step 2: read file 1 line by line. For each line with initial '>', use regular expression to extract its sequence id and find the corresponding taxon name from the hash table. Then reformat the sequence id and print new id out (with initial '>'). For each line without initial '>', just print it out directly. If you are not very familiar with perl, I suggest you to learn it by yourself. Beginning Perl for Bioinformatics is a good book for biologists. Best, Yifei On Fri, Mar 9, 2012 at 2:25 PM, yang liu wrote: > Dear colleagues, > > > > When I do Sanger sequencing, I get hundreds of sequences named by DNA > Numbers, and for several genes. I need to add taxon name manually for each > sequence. I wonder is there a way to change the names automatically? > > > I have two .txt files. > > file 1, with seqeucens named by DNA Number: > >2863 > AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT > ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC > > >2864 > AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT > ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT > ........ > > > file 2, with DNA Number and taxa names, seperated by tabs > 2863 Gelidium > 2864 Poa > ........ > > I hope the final file to be like this, > >Gelidium-2863 > AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT > ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC > > >Poa-2864 > AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT > ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT > Any ideas? Anything help would be appreciated. > > Yang. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Yifei Huang Department of Biology McMaster University From p.j.a.cock at googlemail.com Fri Mar 9 18:00:17 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 9 Mar 2012 23:00:17 +0000 Subject: [Bioperl-l] Sorry in advance, not exactly a BIOPERL question, but sure you know about it... In-Reply-To: References: <4F546D9C.7080808@gmail.com> <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> <4F554696.3090905@gmail.com> Message-ID: On Fri, Mar 9, 2012 at 9:49 PM, Thomas Sharpton wrote: > Hi Juan, > > ... And we might move this conversation off the list given that > it isn't quite on topic. > > Good luck, > Tom I should have said in my reply the forum on SEQanswers.com would be a good place to ask for advice. Peter From florent.angly at gmail.com Sat Mar 10 21:47:11 2012 From: florent.angly at gmail.com (Florent Angly) Date: Sun, 11 Mar 2012 12:47:11 +1000 Subject: [Bioperl-l] Compliance of Bio::Seq add_SeqFeature() method In-Reply-To: References: <4F546D9C.7080808@gmail.com>, <4247C556-1F23-4816-A2AC-2FF71C6AFE77@drycafe.net> <1CEF9B59-F3B7-4971-82C4-DFE72ED0B86F@illinois.edu> <4F56EB82.9060603@gmail.com> Message-ID: <4F5C122F.7080505@gmail.com> I realize that Hilmar. However, to me, it makes it clearer to add a single feature and it leaves room to implement a add_SeqFeatures() method in FeatureHolderI to add multiple features at once, based on the implementation-specific add_SeqFeature() method. I also realize that deprecation should be used with caution. Feel free to revert and let add_SeqFeature() accept an array of features if you think that the downsides (deprecating arrays) outweights the advantages. Florent On 07/03/12 14:51, Hilmar Lapp wrote: > Accepting an array is not in violation so long as the method also accepts a single feature (i.e., a single object ref), and so long as passing an array isn't going to choke if it is followed by 'EXPAND'. I wouldn't deprecate this otherwise. > > -hilmar > > On Mar 7, 2012, at 12:00 AM, Florent Angly wrote: > >> On 05/03/12 12:13, Fields, Christopher J wrote: >>> Actually, I misread Florent's original post, I was thinking that FeatureHolderI was the outlier here, but it is Bio::Seq. Yes, I think Bio::Seq is abusing the FeatureHolderI interface, it should just be for a single feature (it can safely ignore the 'EXPAND' option). However, use of 'EXPAND' assumes the FeatureHolderI is also a Bio::RangeI (must have a start and end to expand), something that is not mentioned in the interface as a requirement and is not guaranteed, for instance Bio::Seq is not Bio::RangeI. >> Ok, I have: >> 1/ clarified in Bio::FeatureHolderI that there is no guarantee that 'EXPAND' will be honored >> 2/ made Bio::Seq comply to Bio::FeatureHolderI by accepting the 'EXPAND' keyword (but do nothing about it) >> 3/ deprecated the use of passing multiple features to add_SeqFeature() in Bio::Seq >> 4/ updated documentation and code that relied on passing multiple features >> >> That should take care of the issue at hand. See this commit: https://github.com/bioperl/bioperl-live/commit/a5bebe00c505fbf5279f5d717790ed36eefcc2b8 >> >> Note that there are still some modules (e.g. Bio::DB::SeqFeature::Store, Bio::DB::SeqFeature::NormalizedFeature, Bio::SeqFeature::Lite, Bio::DB::SeqFeature, Bio::Search::Tiling::MapTileUtils) that have an add_SeqFeature() method that accepts an array of features but they are not Bio::FeatureHolderI, so that's ok. Maybe they should be Bio::FeatureHolderI but that's another story. >> >> However, I have found that Bio::SimpleAlign is a Bio::FeatureHolderI and is not compliant. I fixed it here: https://github.com/bioperl/bioperl-live/commit/29e0449d05f37c9c748aaaff8cfe596ca7c3d380 >> >> Florent From fossandonc at hotmail.com Sat Mar 10 09:26:08 2012 From: fossandonc at hotmail.com (=?iso-8859-1?Q?Francisco_J._Ossand=F3n?=) Date: Sat, 10 Mar 2012 11:26:08 -0300 Subject: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin In-Reply-To: <23D84053-4132-47B0-9A3C-08D67D540AE4@illinois.edu> References: <4E43EB8C-F8C9-41D7-809F-9AE19E3DD0AC@sgul.ac.uk> <84BCE2A9-ED16-4610-B8D2-BCE12A7DB91B@illinois.edu> <8936622A-F8CA-47B8-BFD4-DFC981503BD8@sgul.ac.uk> <23D84053-4132-47B0-9A3C-08D67D540AE4@illinois.edu> Message-ID: I can provide a couple of real world examples in both strands, I can search for more if needed: >From NC_000911.1, Synechocystis sp. PCC 6803 chromosome (http://www.ncbi.nlm.nih.gov/nuccore/16329170): * NP_439899.1, solanesyl diphosphate synthase = join(3573271..3573470,1..772) >From NC_000868.1, Pyrococcus abyssi GE5 chromosome (http://www.ncbi.nlm.nih.gov/nuccore/14518450): * NP_125692.1, 50S ribosomal protein L1P = complement(join(1764520..1765118,1..61)) Cheers, Francisco J. Ossandon -----Mensaje original----- De: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Fields, Christopher J Enviado el: viernes, 09 de marzo de 2012 17:43 Para: Adam Witney CC: ; Francisco J. Ossand?n Asunto: Re: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around the origin The best way to address this is to propose a set of tests that demonstrates the problem, but I believe you have something that should work: my $location_string = 'join(117..1,135690..135187)'; We probably should cover both aspects, a join that spans the origin on both forward and reverse strands, something like: join(117..1,135690..135187) complement(join(117..1,135690..135187)) chris On Mar 9, 2012, at 5:02 AM, Adam Witney wrote: > > Thanks for your replies. > > After a little more digging there seems to be two things. In the test code i posted, the extra strand attribute in Bio::SeqFeature::Generic (and subsequent call to location_object->strand) changes the internal guide_strand such that this line in Bio::Location::Split->to_FTstring changes the order of subLocations: > > my @locs = ($stype eq 'join' && (!$guide && $strand == -1)) ? > reverse $self->sub_Location() : $self->sub_Location() ; > > The second problem is probably due to the sorting order within Bio::SeqFeature::Generic when calling 'start' and 'end' then it doesn't do the right thing in this case. But I am not quite sure how to fix this. > > Thanks again > > Adam > > On 8 Mar 2012, at 21:38, Fields, Christopher J wrote: > >> I agree, no sorting should be implied (a 'join' order should be based on order of addition alone, and sort should be optional). IIRC there were backwards-compat problems switching this due to reliance on old behavior, but it might be worth trying to see if anything breaks test-wise (and why it breaks). >> >> chris >> >> On Mar 8, 2012, at 2:37 PM, Francisco J. Ossand?n wrote: >> >>> Long ago, Bio::SeqIO had an issue with genes that were split at the origin. >>> This was because Bioperl, when reading a Genbank file, automatically >>> sorted the segments coordinates of the genes that was reading (not >>> considering the possibility in a circular genome, where the sequence >>> could go from the 1st nucleotide directly to the last one), so an >>> extra "-nosort" argument was necessary every time to avoid Bioperl giving the wrong sequence: >>> >>> my $ nt_seq_obj = $feat->spliced_seq(-nosort => 1); >>> >>> This is the same bug. In that case the code was changed so the "-nosort" >>> were applied based on the status of "is_circular" of the genome, see here: >>> https://redmine.open-bio.org/issues/2579 >>> >>> Since your code don't have the "is_circular" information (because it >>> don't come from a file), I guess that the autosorting is kicking in. >>> I think it would be better if all the "autosorting" of the >>> sublocations array inside "Bio::Location::Split" were optional >>> instead of automatic, because of these cases. >>> >>> Cheers, >>> >>> Francisco J. Ossandon >>> >>> -----Mensaje original----- >>> De: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Adam >>> Witney Enviado el: jueves, 08 de marzo de 2012 13:40 >>> Para: bioperl-l at bioperl.org >>> Asunto: [Bioperl-l] Bio::Tools::Glimmer and genes wrapping around >>> the origin >>> >>> Hi, >>> >>> I have been using Bio::Tools::Glimmer and have come across a problem >>> with it not handling genes that wraparound across the origin. I >>> think I have boiled it down to this test case of what happens >>> internally with Bio::Tools::Glimmer >>> >>> ############################################################## >>> #! /usr/local/bin/perl -w >>> >>> use strict; >>> use warnings; >>> >>> use Bio::Factory::FTLocationFactory; use Bio::SeqFeature::Generic; >>> >>> my $location_string = 'join(117..1,135690..135187)'; >>> >>> my $location_factory = Bio::Factory::FTLocationFactory->new(); >>> my $location_object = >>> $location_factory->from_string($location_string); >>> >>> print "Location: ".$location_object->to_FTstring."\n"; >>> >>> my $gene = Bio::SeqFeature::Generic->new( >>> '-seq_id' => 'Testing', >>> '-location' => $location_object, >>> '-strand' => -1 >>> ); >>> >>> print "Location: ".$location_object->to_FTstring."\n"; >>> >>> ############################################################## >>> >>> $ perl ../FTLocationTest.pl >>> Location: complement(join(135187..135690,1..117)) >>> Location: complement(join(1..117,135187..135690)) >>> >>> This happens because by setting the '-strand' in >>> Bio::SeqFeature::Generic, this calls the strand method in >>> $location_object (Bio::Location::Split) which then causes the problem (although I can't quite work out where...! >>> >>> Is this intended behaviour? >>> >>> Thanks >>> >>> Adam >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Sat Mar 10 20:46:46 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Sat, 10 Mar 2012 17:46:46 -0800 Subject: [Bioperl-l] modify sequence name In-Reply-To: References: Message-ID: Since this is a bioperl list, I would suggest a more bioperl solution that doesn't require you to do the parsing or splitting, just read the sequences in with Bio::SeqIO and manipulate the id which you get/set with seq->display_id methods. Did you look at the SeqIO HOWTO on the bioperl website? Jason On Mar 9, 2012, at 1:50 PM, Yifei Huang wrote: > Hi Yang, > > It is fairly easy to do that in perl. You may write a perl script like this: > > Step 1: read file 2 line by line and use the function 'split' to separate > sequence Ids and taxon names. Then construct a hash table in which keys are > sequence ids and values are taxon names. > > Step 2: read file 1 line by line. For each line with initial '>', use > regular expression to extract its sequence id and find the corresponding > taxon name from the hash table. Then reformat the sequence id and print new > id out (with initial '>'). For each line without initial '>', just print it > out directly. > > If you are not very familiar with perl, I suggest you to learn it by > yourself. Beginning Perl for Bioinformatics is a good book for biologists. > > Best, > > Yifei > > On Fri, Mar 9, 2012 at 2:25 PM, yang liu wrote: > >> Dear colleagues, >> >> >> >> When I do Sanger sequencing, I get hundreds of sequences named by DNA >> Numbers, and for several genes. I need to add taxon name manually for each >> sequence. I wonder is there a way to change the names automatically? >> >> >> I have two .txt files. >> >> file 1, with seqeucens named by DNA Number: >>> 2863 >> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT >> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC >> >>> 2864 >> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT >> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT >> ........ >> >> >> file 2, with DNA Number and taxa names, seperated by tabs >> 2863 Gelidium >> 2864 Poa >> ........ >> >> I hope the final file to be like this, >>> Gelidium-2863 >> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT >> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTC >> >>> Poa-2864 >> AGGATTAAAAATCAACGCTATGAATCTGGTGTAATTCCATATGCTAAAATGGGCTATTGGGATCCTAATT >> ATGCAATTAAAGAAACTGATGTATTAGCATTATTTCGTATTACTCCACAACCAGGTGTAGAT >> Any ideas? Anything help would be appreciated. >> >> Yang. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > Yifei Huang > Department of Biology > McMaster University > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From youryanyun at gmail.com Mon Mar 12 01:33:47 2012 From: youryanyun at gmail.com (yun YAN) Date: Mon, 12 Mar 2012 13:33:47 +0800 Subject: [Bioperl-l] Are there arguments for REGION of ACCESSION in Bio::DB Message-ID: One's goal is to get both exon/intron region of gene of interest from remote database(NCBI), with the help of Bio::DB::GenBank. "get_seq_by_acc" will work for most cases, but it seems that it cannot be used for exon/intron parsing. Let's say gene SMN1, http://www.ncbi.nlm.nih.gov/nuccore/NC_000005.9?report=genbank&from=70220768&to=70248839 . The exon/inron information can only be available in genome assembly part, and the accession number ( NC_000005) is actually the genome contig, not gene. To define my gene SMN1, an additional argument "REGION" is needed (REGION: 70220768..70248839). If I use simply "get_seq_by_acc", it will not return the gene, but return the genome assembly results. Thus any ideas about how to retrieve the gene (not mRNA) containing both exon/intron? Are there any additional arguments in get_by_acc('XXXX') REGION( 1234..6789), perhaps? I want to use command-line as much as possible. I used to copy out the page (indeed they are arranged in strict genbank format) and paste as genbank file , and afterwards I use Bio::DB::GenBank LOCALLY. The first step is done actually by my hand, by graphic interface which is not convenient. Thanks From roy.chaudhuri at gmail.com Mon Mar 12 08:38:08 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 12 Mar 2012 12:38:08 +0000 Subject: [Bioperl-l] Are there arguments for REGION of ACCESSION in Bio::DB In-Reply-To: References: Message-ID: <4F5DEE30.8040800@gmail.com> I think this is what you want: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences#Using_Bio::DB::GenBank_when_you_have_genomic_coordinates_to_get_a_Seq_object On 12/03/2012 05:33, yun YAN wrote: > One's goal is to get both exon/intron region of gene of interest from > remote database(NCBI), with the help of Bio::DB::GenBank. "get_seq_by_acc" > will work for most cases, but it seems that it cannot be used for > exon/intron parsing. > > Let's say gene SMN1, > http://www.ncbi.nlm.nih.gov/nuccore/NC_000005.9?report=genbank&from=70220768&to=70248839 > . > The exon/inron information can only be available in genome assembly part, > and the accession number ( > NC_000005) is > actually the genome contig, not gene. To define my gene SMN1, an additional > argument "REGION" is needed (REGION: 70220768..70248839). If I use simply > "get_seq_by_acc", it will not return the gene, but return the genome > assembly results. > > Thus any ideas about how to retrieve the gene (not mRNA) containing both > exon/intron? Are there any additional arguments in get_by_acc('XXXX') > REGION( 1234..6789), perhaps? > > I want to use command-line as much as possible. I used to copy out the page > (indeed they are arranged in strict genbank format) and paste as genbank > file , and afterwards I use Bio::DB::GenBank LOCALLY. The first step is > done actually by my hand, by graphic interface which is not convenient. > > Thanks > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From leondott at gmail.com Mon Mar 12 11:06:20 2012 From: leondott at gmail.com (Leon Chen) Date: Mon, 12 Mar 2012 23:06:20 +0800 Subject: [Bioperl-l] Failed Test Installing BioPerl-1.6.901 Message-ID: Hi, I have met some problems in installing BioPerl-1.6.901.Test Summary Report as Follows: ========================================================== Test Summary Report ------------------- t/SeqIO/SeqIO.t (Wstat: 256 Tests: 45 Failed: 1) Failed test: 45 Non-zero exit status: 1 Files=349, Tests=22689, 143 wallclock secs ( 3.47 usr 0.77 sys + 129.74 cusr 7.47 csys = 141.45 CPU) Result: FAIL Failed 1/349 test programs. 1/22689 subtests failed. CJFIELDS/BioPerl-1.6.901.tar.gz ./Build test -- NOT OK //hint// to see the cpan-testers results for installing this module, try: reports CJFIELDS/BioPerl-1.6.901.tar.gz Running Build install make test had returned bad status, won't install without force Failed during this command: CJFIELDS/BioPerl-1.6.901.tar.gz : make_test NO ========================================================== How to solve this Problem?Test In Ubuntu 10.04 LTS. Thanks for help Yours From wkretzsch at gmail.com Mon Mar 12 12:35:37 2012 From: wkretzsch at gmail.com (Warren W. Kretzschmar) Date: Mon, 12 Mar 2012 16:35:37 +0000 Subject: [Bioperl-l] Failed Test Installing BioPerl-1.6.901 In-Reply-To: References: Message-ID: Hey Leon, You'll need to get more information on the test that failed. Here is how it usually works on UNIX: make testdb TEST_FILE=t/SeqIO/SeqIO.t to enter the debugger, then hit 'c' and enter to run through the program. Let us know what you see. Cheers, Winni -- In God we trust, all others bring data. - William Edwards Deming On Mon, Mar 12, 2012 at 3:06 PM, Leon Chen wrote: > Hi, > ?I have met some problems in installing BioPerl-1.6.901.Test Summary > Report as Follows: > > ========================================================== > Test Summary Report > ------------------- > t/SeqIO/SeqIO.t ? ? ? ? ? ? ? ? ? ? ? ? ? ?(Wstat: 256 Tests: 45 Failed: 1) > ?Failed test: ?45 > ?Non-zero exit status: 1 > Files=349, Tests=22689, 143 wallclock secs ( 3.47 usr ?0.77 sys + 129.74 > cusr ?7.47 csys = 141.45 CPU) > Result: FAIL > Failed 1/349 test programs. 1/22689 subtests failed. > ?CJFIELDS/BioPerl-1.6.901.tar.gz > ?./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > ?reports CJFIELDS/BioPerl-1.6.901.tar.gz > Running Build install > ?make test had returned bad status, won't install without force > Failed during this command: > ?CJFIELDS/BioPerl-1.6.901.tar.gz ? ? ? ? ? ? ?: make_test NO > ========================================================== > > How to solve this Problem?Test In Ubuntu 10.04 LTS. > Thanks for help > > Yours > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Mar 12 12:43:22 2012 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 12 Mar 2012 11:43:22 -0500 Subject: [Bioperl-l] Failed Test Installing BioPerl-1.6.901 In-Reply-To: References: Message-ID: <4F5E27AA.2020301@illinois.edu> Probably would need to use './Build testdb --test-files=t/SeqIO/SeqIO.t' (ironic caveat, untested :) chris On 03/12/2012 11:35 AM, Warren W. Kretzschmar wrote: > Hey Leon, > You'll need to get more information on the test that failed. Here is > how it usually works on UNIX: > > make testdb TEST_FILE=t/SeqIO/SeqIO.t > > to enter the debugger, then hit 'c' and enter to run through the > program. Let us know what you see. > > Cheers, > Winni > -- > In God we trust, all others bring data. - William Edwards Deming > > > > On Mon, Mar 12, 2012 at 3:06 PM, Leon Chen wrote: >> Hi, >> I have met some problems in installing BioPerl-1.6.901.Test Summary >> Report as Follows: >> >> ========================================================== >> Test Summary Report >> ------------------- >> t/SeqIO/SeqIO.t (Wstat: 256 Tests: 45 Failed: 1) >> Failed test: 45 >> Non-zero exit status: 1 >> Files=349, Tests=22689, 143 wallclock secs ( 3.47 usr 0.77 sys + 129.74 >> cusr 7.47 csys = 141.45 CPU) >> Result: FAIL >> Failed 1/349 test programs. 1/22689 subtests failed. >> CJFIELDS/BioPerl-1.6.901.tar.gz >> ./Build test -- NOT OK >> //hint// to see the cpan-testers results for installing this module, try: >> reports CJFIELDS/BioPerl-1.6.901.tar.gz >> Running Build install >> make test had returned bad status, won't install without force >> Failed during this command: >> CJFIELDS/BioPerl-1.6.901.tar.gz : make_test NO >> ========================================================== >> >> How to solve this Problem?Test In Ubuntu 10.04 LTS. >> Thanks for help >> >> Yours >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Mar 12 14:24:33 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 12 Mar 2012 18:24:33 +0000 Subject: [Bioperl-l] Failed Test Installing BioPerl-1.6.901 In-Reply-To: References: Message-ID: This unfortunately doesn't give us much to work on. Can you give the verbose test output? Something like: prove -lrv t/SeqIO/SeqIO.t or ./Build test --test-files t/SeqIO/SeqIO.t --verbose chris On Mar 12, 2012, at 10:06 AM, Leon Chen wrote: > Hi, > I have met some problems in installing BioPerl-1.6.901.Test Summary > Report as Follows: > > ========================================================== > Test Summary Report > ------------------- > t/SeqIO/SeqIO.t (Wstat: 256 Tests: 45 Failed: 1) > Failed test: 45 > Non-zero exit status: 1 > Files=349, Tests=22689, 143 wallclock secs ( 3.47 usr 0.77 sys + 129.74 > cusr 7.47 csys = 141.45 CPU) > Result: FAIL > Failed 1/349 test programs. 1/22689 subtests failed. > CJFIELDS/BioPerl-1.6.901.tar.gz > ./Build test -- NOT OK > //hint// to see the cpan-testers results for installing this module, try: > reports CJFIELDS/BioPerl-1.6.901.tar.gz > Running Build install > make test had returned bad status, won't install without force > Failed during this command: > CJFIELDS/BioPerl-1.6.901.tar.gz : make_test NO > ========================================================== > > How to solve this Problem?Test In Ubuntu 10.04 LTS. > Thanks for help > > Yours > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Mon Mar 12 15:39:02 2012 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 12 Mar 2012 15:39:02 -0400 Subject: [Bioperl-l] bioperl In-Reply-To: References: Message-ID: <4C963DF47C4344BAA214E05FC640CB23@NewLife> Hi Mitra, This is a Windows shell error, not a bioperl error. It suggests that a filename is being referenced that does not comply with the Windows spec. For example, you can generate this error from the command line by doing C: >echo blurg > fds?fdsf.txt since ??? is not a valid filename char. MAJ From: Mitra s Sent: Monday, March 12, 2012 3:00 PM To: maj at fortinbras.us Subject: bioperl Dear Dr Jensen: May I have a question about bioperl? i recently installed bioperl toolkit via PPM. After writing the programs(that im sure the codes are correct), I confronted with this error: The filename, directory name, or volume label syntax is incorrect. I tried a lot to find what does this error mean but unfortunately could not find anything. I would be so grateful if you kindly direct me. I know you are an expert and hope you dont mind answer to a very beginner bioperl programmer. thanks again, sincerely, Mitra From ammar.husami at gmail.com Mon Mar 12 15:54:19 2012 From: ammar.husami at gmail.com (husamia) Date: Mon, 12 Mar 2012 12:54:19 -0700 (PDT) Subject: [Bioperl-l] downloading genbank records for NT_ accessions with SNPs Features Added by NCBI Message-ID: <05dca577-0f03-473e-910c-63f040ea107b@y10g2000vbn.googlegroups.com> Dear community, I am new user of bioperl and I am need of help. I searched but I could not find my answer in the help documents of bioperl. Is it possible to download NT_ accession.version records using bioperl including SNPs features added by NCBI? Please read the senario as this is specific and not sure if its implemented for example here is senario. Here is how to download the records from the website go to the URL for this record http://www.ncbi.nlm.nih.gov/nuccore/NT_024524.14?save=on&report=genbank&from=1741602&to=1747114 on the right hand side under customize view menu select checkbox "Features added by NCBI 1236766 SNPs" then click ok "Update View" The records is retrieved. Can bioperl perform the same task? Thank you From youryanyun at gmail.com Tue Mar 13 04:30:43 2012 From: youryanyun at gmail.com (yun YAN) Date: Tue, 13 Mar 2012 16:30:43 +0800 Subject: [Bioperl-l] Are there arguments for REGION of ACCESSION in Bio::DB In-Reply-To: <4F5DEE30.8040800@gmail.com> References: <4F5DEE30.8040800@gmail.com> Message-ID: Dear Roy, Great thanks for your reply. And I try it as soon as I receive your mail. However, it reports an error: MSG: acc NM_000344 does not exist STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /usr/local/share/perl/5.10.1/Bio/DB/WebDBSeqI.pm:195 STACK: test_gene_bank_with_sublocation.pl:26 I've repeatedly checked my codes, and still cannot figure out where is the bug. At first I think maybe it does not support genome assembly (NC_000005), thus I try SMN1 gene directly ( NM_000344). Neither of them works. Even the simplest codes still report the error: "acc NM_000344 does not exist", while the accession number does exists, http://www.ncbi.nlm.nih.gov/nuccore/NM_000344.3. My test code is (almost exactly copied from HOWTO tutorial) : use strict; use warnings; use Bio::DB::GenBank; my $gb = Bio::DB::GenBank->new (-format => 'genbank', -seq_start => 1, -seq_stop => 2000, -strand =>1,); my $seq_obj = $gb->get_Seq_by_acc('NM_000344'); print $seq_obj; #just for test Currently my perl is 5.10.1, and BioPerl stays in 1.6.1. All codes run on Ubuntu 10.04 LTS. I've checked Bio::DB::GenBank module of 1.6.1 version, and it supports -seq_start and -seq_stop function. Any ideas? Hope I don't make some low-level mistakes. Look forward to your reply. Thanks. On Mon, Mar 12, 2012 at 8:38 PM, Roy Chaudhuri wrote: > I think this is what you want: > http://www.bioperl.org/wiki/**HOWTO:Getting_Genomic_** > Sequences#Using_Bio::DB::**GenBank_when_you_have_genomic_** > coordinates_to_get_a_Seq_**object > > > On 12/03/2012 05:33, yun YAN wrote: > >> One's goal is to get both exon/intron region of gene of interest from >> remote database(NCBI), with the help of Bio::DB::GenBank. "get_seq_by_acc" >> will work for most cases, but it seems that it cannot be used for >> exon/intron parsing. >> >> Let's say gene SMN1, >> http://www.ncbi.nlm.nih.gov/**nuccore/NC_000005.9?report=** >> genbank&from=70220768&to=**70248839 >> . >> The exon/inron information can only be available in genome assembly part, >> and the accession number ( >> NC_000005>) >> is >> >> actually the genome contig, not gene. To define my gene SMN1, an >> additional >> argument "REGION" is needed (REGION: 70220768..70248839). If I use simply >> "get_seq_by_acc", it will not return the gene, but return the genome >> assembly results. >> >> Thus any ideas about how to retrieve the gene (not mRNA) containing both >> exon/intron? Are there any additional arguments in get_by_acc('XXXX') >> REGION( 1234..6789), perhaps? >> >> I want to use command-line as much as possible. I used to copy out the >> page >> (indeed they are arranged in strict genbank format) and paste as genbank >> file , and afterwards I use Bio::DB::GenBank LOCALLY. The first step is >> done actually by my hand, by graphic interface which is not convenient. >> >> Thanks >> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > > From maj at fortinbras.us Tue Mar 13 08:37:28 2012 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 13 Mar 2012 08:37:28 -0400 Subject: [Bioperl-l] bioperl In-Reply-To: References: , <4C963DF47C4344BAA214E05FC640CB23@NewLife> Message-ID: <27394D731A0D4542A84555CEF901B34B@NewLife> Ok Mitra, you?ll want to send the code that caused the error, otherwise we won?t be able to help very much. Note I have cc?d the bioperl list; please reply all, so that the many friendly and helpful experts on this list can also assist. thanks MAJ From: Mitra s Sent: Monday, March 12, 2012 11:59 PM To: maj at fortinbras.us Subject: RE: bioperl Dear Dr Jensen: Thank you very much. I got the meaning of the error, although still not know hoe to solve this error. Sincerely, Mitra -------------------------------------------------------------------------------- From: maj at fortinbras.us To: matin627 at hotmail.com CC: bioperl-l at bioperl.org Subject: Re: bioperl Date: Mon, 12 Mar 2012 15:39:02 -0400 Hi Mitra, This is a Windows shell error, not a bioperl error. It suggests that a filename is being referenced that does not comply with the Windows spec. For example, you can generate this error from the command line by doing C: >echo blurg > fds?fdsf.txt since ??? is not a valid filename char. MAJ From: Mitra s Sent: Monday, March 12, 2012 3:00 PM To: maj at fortinbras.us Subject: bioperl Dear Dr Jensen: May I have a question about bioperl? i recently installed bioperl toolkit via PPM. After writing the programs(that im sure the codes are correct), I confronted with this error: The filename, directory name, or volume label syntax is incorrect. I tried a lot to find what does this error mean but unfortunately could not find anything. I would be so grateful if you kindly direct me. I know you are an expert and hope you dont mind answer to a very beginner bioperl programmer. thanks again, sincerely, Mitra From fossandonc at hotmail.com Tue Mar 13 12:21:33 2012 From: fossandonc at hotmail.com (=?iso-8859-1?Q?Francisco_J._Ossand=F3n?=) Date: Tue, 13 Mar 2012 13:21:33 -0300 Subject: [Bioperl-l] Bioperl CPAN installation issues Message-ID: Hello, Today I was updating my Perl modules using the CPAN client, through the ?upgrade? command (Im using strawberry perl, not activestate), and something weird popped up. I have installed Bioperl version 1.006901, which is the same one in CPAN, but the client don?t recognize the installed Bioperl version and throws an ?undef? version instead, so it reinstall the whole thing again if told so. Also, while Perl was checking the versions of the installed modules to compare them to CPAN latest versions, it throws an error with a lot of code saying that it could not eval ?Bio\Ontology\SimpleGOEngine\GraphAdaptor.pm?. Please see the output below and check the ?Bio::Align::AlignI? and ?Bio\Ontology\SimpleGOEngine\GraphAdaptor.pm? outputs. ##### cpan> upgrade Database was generated on Mon, 12 Mar 2012 18:06:06 GMT Package namespace installed latest in CPAN file DBD::mysql 4.018 4.020 CAPTTOFU/DBD-mysql-4.020.tar.gz IO::Socket::SSL 1.39 1.59 SULLR/IO-Socket-SSL-1.59.tar.gz Bio::Align::AlignI undef 1.006901 CJFIELDS/BioPerl-1.6.901.tar.gz Could not eval ' package ExtUtils::MakeMaker::_version; no strict; BEGIN { eval { # Ensure any version() routine which might have leaked # into this package has been deleted. Interferes with # version->import() undef *version; require version; "version"->import; } } local $Graph::VERSION; $Graph::VERSION=undef; do { ( defined $Graph::VERSION && $Graph::VERSION >= 0.5 ) ? }; $Graph::VERSION; ' in C:\strawberry\perl\site\lib\Bio\Ontology\SimpleGOEngine\GraphAd aptor.pm: syntax error at (eval 429) line 17, at EOF Could not eval ' package ExtUtils::MakeMaker::_version; no strict; BEGIN { eval { # Ensure any version() routine which might have leaked # into this package has been deleted. Interferes with # version->import() undef *version; require version; "version"->import; } } local $Graph::VERSION; $Graph::VERSION=undef; do { ( defined $Graph::VERSION && $Graph::VERSION >= 0.5 ) ? }; $Graph::VERSION; ' in C:\strawberry\perl\site\lib\Bio\Ontology\SimpleGOEngine\GraphAd aptor.pm: syntax error at (eval 430) line 17, at EOF Error::Simple undef 0 SHLOMIF/Error-0.17017.tar.gz ##### Cheers, Francisco J. Ossandon From cjfields at illinois.edu Tue Mar 13 12:39:59 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 13 Mar 2012 16:39:59 +0000 Subject: [Bioperl-l] Bioperl CPAN installation issues In-Reply-To: References: Message-ID: <3CEFE0DD-8DE4-4029-9F1E-BB50FA9CF185@illinois.edu> That is a bit odd, but it has been reported before. I haven't been able to dedicate any time to tracing it down, so any help is appreciated: https://redmine.open-bio.org/issues/3041 chris On Mar 13, 2012, at 11:21 AM, Francisco J. Ossand?n wrote: > Hello, > > Today I was updating my Perl modules using the CPAN client, through the > "upgrade" command (Im using strawberry perl, not activestate), and something > weird popped up. > > > > I have installed Bioperl version 1.006901, which is the same one in CPAN, > but the client don't recognize the installed Bioperl version and throws an > "undef" version instead, so it reinstall the whole thing again if told so. > Also, while Perl was checking the versions of the installed modules to > compare them to CPAN latest versions, it throws an error with a lot of code > saying that it could not eval "Bio\Ontology\SimpleGOEngine\GraphAdaptor.pm". > > > > Please see the output below and check the "Bio::Align::AlignI" and > "Bio\Ontology\SimpleGOEngine\GraphAdaptor.pm" outputs. > > ##### > > cpan> upgrade > > Database was generated on Mon, 12 Mar 2012 18:06:06 GMT > > > > Package namespace installed latest in CPAN file > > DBD::mysql 4.018 4.020 > CAPTTOFU/DBD-mysql-4.020.tar.gz > > IO::Socket::SSL 1.39 1.59 > SULLR/IO-Socket-SSL-1.59.tar.gz > > Bio::Align::AlignI undef 1.006901 > CJFIELDS/BioPerl-1.6.901.tar.gz > > Could not eval ' > > package ExtUtils::MakeMaker::_version; > > no strict; > > BEGIN { eval { > > # Ensure any version() routine which might have leaked > > # into this package has been deleted. Interferes with > > # version->import() > > undef *version; > > require version; > > "version"->import; > > } } > > > > local $Graph::VERSION; > > $Graph::VERSION=undef; > > do { > > ( defined $Graph::VERSION && $Graph::VERSION >= 0.5 > ) ? > > }; > > $Graph::VERSION; > > ' in > C:\strawberry\perl\site\lib\Bio\Ontology\SimpleGOEngine\GraphAd > > aptor.pm: syntax error at (eval 429) line 17, at EOF > > Could not eval ' > > package ExtUtils::MakeMaker::_version; > > no strict; > > BEGIN { eval { > > # Ensure any version() routine which might have leaked > > # into this package has been deleted. Interferes with > > # version->import() > > undef *version; > > require version; > > "version"->import; > > } } > > > > local $Graph::VERSION; > > $Graph::VERSION=undef; > > do { > > ( defined $Graph::VERSION && $Graph::VERSION >= 0.5 > ) ? > > }; > > $Graph::VERSION; > > ' in > C:\strawberry\perl\site\lib\Bio\Ontology\SimpleGOEngine\GraphAd > > aptor.pm: syntax error at (eval 430) line 17, at EOF > > Error::Simple undef 0 SHLOMIF/Error-0.17017.tar.gz > > ##### > > > > Cheers, > > > > Francisco J. Ossandon > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Tue Mar 13 12:41:36 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 13 Mar 2012 16:41:36 +0000 Subject: [Bioperl-l] Are there arguments for REGION of ACCESSION in Bio::DB In-Reply-To: References: <4F5DEE30.8040800@gmail.com> Message-ID: <4F5F78C0.9030901@gmail.com> Hi, I get the same error as you, although I should also note that I'm not familiar with this module, so I may be missing a problem with the HowTo code. Also (like you) I have an old version of BioPerl installed, so perhaps you could try upgrading your BioPerl to the most recent version (1.6.901) from CPAN or bioperl-live from GitHub? There have probably been modifications to Bio::DB::GenBank since 1.6.1. One thing I noticed - the accession numbers you quote are from RefSeq, not GenBank (the NCBI make the two difficult to distinguish in Entrez, but RefSeq accessions contain an underscore). I tried replacing Bio::DB::GenBank with Bio::DB::RefSeq and that seemed to work - according to the docs the RefSeq module downloads from the EBI rather than the NCBI. Cheers, Roy. On 13/03/2012 08:30, yun YAN wrote: > Dear Roy, > Great thanks for your reply. And I try it as soon as I receive your > mail. However, it reports an error: > > MSG: acc NM_000344 does not exist > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.10.1/Bio/Root/Root.pm:368 > STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc > /usr/local/share/perl/5.10.1/Bio/DB/WebDBSeqI.pm:195 > STACK: test_gene_bank_with_sublocation.pl:26 > > > I've repeatedly checked my codes, and still cannot figure out where is > the bug. At first I think maybe it does not support genome assembly > (NC_000005), thus I try SMN1 gene directly ( NM_000344). Neither of them > works. Even the simplest codes still report the error: "acc NM_000344 > does not exist", while the accession number does exists, > http://www.ncbi.nlm.nih.gov/nuccore/NM_000344.3. > My test code is (almost exactly copied from HOWTO tutorial) : > > use strict; > use warnings; > use Bio::DB::GenBank; > my $gb = Bio::DB::GenBank->new (-format => 'genbank', -seq_start => > 1, -seq_stop => 2000, -strand =>1,); > my $seq_obj = $gb->get_Seq_by_acc('NM_000344'); > print $seq_obj; #just for test > > Currently my perl is 5.10.1, and BioPerl stays in 1.6.1. All codes run > on Ubuntu 10.04 LTS. I've checked Bio::DB::GenBank module of 1.6.1 > version, and it supports -seq_start and -seq_stop function. > Any ideas? Hope I don't make some low-level mistakes. Look forward to > your reply. > Thanks. > > On Mon, Mar 12, 2012 at 8:38 PM, Roy Chaudhuri > wrote: > > I think this is what you want: > http://www.bioperl.org/wiki/__HOWTO:Getting_Genomic___Sequences#Using_Bio::DB::__GenBank_when_you_have_genomic___coordinates_to_get_a_Seq___object > > > > On 12/03/2012 05:33, yun YAN wrote: > > One's goal is to get both exon/intron region of gene of interest > from > remote database(NCBI), with the help of Bio::DB::GenBank. > "get_seq_by_acc" > will work for most cases, but it seems that it cannot be used for > exon/intron parsing. > > Let's say gene SMN1, > http://www.ncbi.nlm.nih.gov/__nuccore/NC_000005.9?report=__genbank&from=70220768&to=__70248839 > > . > The exon/inron information can only be available in genome > assembly part, > and the accession number ( > NC_000005 >) is > > actually the genome contig, not gene. To define my gene SMN1, an > additional > argument "REGION" is needed (REGION: 70220768..70248839). If I > use simply > "get_seq_by_acc", it will not return the gene, but return the genome > assembly results. > > Thus any ideas about how to retrieve the gene (not mRNA) > containing both > exon/intron? Are there any additional arguments in > get_by_acc('XXXX') > REGION( 1234..6789), perhaps? > > I want to use command-line as much as possible. I used to copy > out the page > (indeed they are arranged in strict genbank format) and paste as > genbank > file , and afterwards I use Bio::DB::GenBank LOCALLY. The first > step is > done actually by my hand, by graphic interface which is not > convenient. > > Thanks > _________________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/__mailman/listinfo/bioperl-l > > > > From cjfields at illinois.edu Tue Mar 13 13:46:00 2012 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 13 Mar 2012 12:46:00 -0500 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: <4F56ED48.9090500@gmail.com> References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> <4F55945A.7070901@gmail.com> <4F56ED48.9090500@gmail.com> Message-ID: <4F5F87D8.9000301@illinois.edu> Florent, Getting a test fail on the amplicon branch, seems to be missing a file. The only test consistently failing is t/Tools/AmpliconSearch.t (I'm seeing NCBI-related test issues that are also occurring on master). Using 'prove -lr t/Tools/AmpliconSearch.t': [cjfields at pyrimidine bioperl-live (amplicons=)]$ prove -lr t/Tools/AmpliconSearch.t t/Tools/AmpliconSearch.t .. 1/174 ------------- EXCEPTION ------------- MSG: Could not open t/data/forward_primer.fa: No such file or directory STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:351 STACK Bio::SeqIO::_initialize Bio/SeqIO.pm:477 STACK Bio::SeqIO::fasta::_initialize Bio/SeqIO/fasta.pm:93 STACK Bio::SeqIO::new Bio/SeqIO.pm:358 STACK Bio::SeqIO::new Bio/SeqIO.pm:399 STACK Bio::Tools::AmpliconSearch::primer_file Bio/Tools/AmpliconSearch.pm:292 STACK Bio::Tools::AmpliconSearch::new Bio/Tools/AmpliconSearch.pm:151 STACK toplevel t/Tools/AmpliconSearch.t:126 ------------------------------------- # Looks like you planned 174 tests but ran 53. # Looks like your test exited with 2 just after 53. t/Tools/AmpliconSearch.t .. Dubious, test returned 2 (wstat 512, 0x200) Failed 121/174 subtests Test Summary Report ------------------- t/Tools/AmpliconSearch.t (Wstat: 512 Tests: 53 Failed: 0) Non-zero exit status: 2 Parse errors: Bad plan. You planned 174 tests but ran 53. Files=1, Tests=53, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.18 cusr 0.01 csys = 0.23 CPU) Result: FAIL chris On 03/06/2012 11:08 PM, Florent Angly wrote: > > Yes, thanks Chris. If you want to start splitting a Bio-Tools > distribution that would include Bio::Tools::AmpliconSearch, I am happy > to help. > > In general, I am not specifically attached to the namespace, so if you > guys prefer something different, just tell me. > > Note that AmpliconSearch uses a couple of new or re-worked objects, > namely Bio::SeqFeature::Primer, Bio::SeqFeature::Amplicon and > Bio::SeqFeature::SubSeq. Both Primer and Amplicons inherit from SubSeq, > which inherits from Bio::SeqFeature::Generic. The main goal of SubSeq is > to allow fetching a subsequence from the attached sequence or to > explicitly adding a sequence that represents the feature (the Generic > feature class does not support setting such a sequence, just getting the > subsequence). I think it would be good to add this functionality to the > Generic feature class, but I did not want to force things without asking > everybody first if this seems like a good idea. > > Florent > > > On 06/03/12 11:08, Fields, Christopher J wrote: >> I'll check it out. Want me to post test results here (I have access to >> a few systems to test on). >> >> chris >> >> On Mar 5, 2012, at 10:36 PM, Florent Angly wrote: >> >>> To all interested, >>> the AmpliconSearch module is in a decent state. If you want to test >>> it or improve it, head to >>> https://github.com/bioperl/bioperl-live/blob/amplicons/Bio/Tools/AmpliconSearch.pm >>> >>> Regards, >>> Florent >>> >>> >>> On 01/03/12 12:42, Fields, Christopher J wrote: >>>> Florent, >>>> >>>> Just want to add, my previous response isn't meant as an >>>> admonishment, hope it didn't come across that way, but sometimes >>>> email makes it hard to discern the difference. I simply meant to >>>> demonstrate my opinion that I find releasing one's code is much >>>> simpler (e.g. you can decide the rules and dictate when the code is >>>> ready for release), and if we can make getting good code into user's >>>> hands easier, more flexible, and more consistent I think that is >>>> always a better path. >>>> >>>> chris >>>> >>>> On Feb 29, 2012, at 8:30 PM, Fields, Christopher J wrote: >>>> >>>>> There are a number of very good reasons to separate out common code >>>>> and create new repos for new code. The problem about adding new >>>>> code into core is it ties your code development to bioperl-live's >>>>> release cycle and versioning. Also, what I (and others) would not >>>>> like to see is any additional dependencies introduced, but a >>>>> separate release allows you to (1) both add a dependency w/o >>>>> affecting core, and (2) make it required, so no fiddling with >>>>> checking for the module prior to running tests on it. >>>>> >>>>> As an example, I can easily see something like >>>>> Bio::SearchIO::blastxml living on it's own since it has a set of >>>>> outside dependencies. >>>>> >>>>> BTW, separation of modules into separate distributions (even single >>>>> modules) based on functionality above and beyond that defined in a >>>>> core is very common in the perl world. Beyond the obvious example >>>>> of anything non-core in perl (all installable via CPAN), Moose, >>>>> Dist::Zilla, Catalyst, Dancer, etc all have separately installable >>>>> dists that layer additional functionality and have a separate >>>>> maintenance path. >>>>> >>>>> chris >>>>> >>>>> On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: >>>>> >>>>>> Thanks for everybody's feedback. >>>>>> >>>>>> I am looking at existing modules to hold template sequence, >>>>>> amplicon sequence and primer information. There is the >>>>>> Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the >>>>>> PrimedSeq object places Primer objects on the target sequence. I >>>>>> have been looking at refreshing these modules (they are quite >>>>>> old), add some sanity to them and make sure they are suitable for >>>>>> a generic implementation of PCR (or amplicon search, which I find >>>>>> a more suitable name since it is a far cry from simulating PCR >>>>>> cycles, etc). >>>>>> >>>>>> I will make a remote branch today to make it easier for interested >>>>>> parties to experiment and contribute. >>>>>> >>>>>> As you can see Chris, the amplicon search feature would use two >>>>>> existing bioperl-live modules and only add one, tentatively in the >>>>>> Bio::Tools::AmpliconSearch namespace. I am not convinced that this >>>>>> warrants a separate distro. >>>>>> >>>>>> Florent >>>>>> >>>>>> On 01/03/12 01:23, Fields, Christopher J wrote: >>>>>>> Seems like it was meant to be added at some point but was never >>>>>>> committed. Definitely not in the github history for 1.3.x, this >>>>>>> commit corresponds to the v1.3.4 tag: >>>>>>> >>>>>>> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >>>>>>> >>>>>>> >>>>>>> and it's not there. >>>>>>> >>>>>>> I agree with Roy, it would be nice to somehow make this a little >>>>>>> more generic or pluggable on how it maps primers (maybe with a >>>>>>> default pure perl method). I also think this shouldn't be bound >>>>>>> to bioperl-live considering our current plans, it would best >>>>>>> happen in a separate repo. >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >>>>>>> >>>>>>>> The code for Bio::Tools::PCRSimulation can be downloaded as part >>>>>>>> of this archive: >>>>>>>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>>>>>>> >>>>>>>> (There's supposedly a more recent version here: >>>>>>>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>>>>>>> but that file seems to be truncated). >>>>>>>> >>>>>>>> I have no idea how much would be salvagable. It seems to just >>>>>>>> use index to map the primers to the sequence, I guess it would >>>>>>>> make more sense to at least give the option of something more >>>>>>>> sophisticated like Primer3, BLAST or even a short read mapper. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Roy. >>>>>>>> >>>>>>>> >>>>>>>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>>>>>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I am interested in the Bio::Tools::PCRSimulation module. >>>>>>>>>> Supposedly >>>>>>>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>>>>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>>>>>>> Bioperl codebase. Any idea where it went? >>>>>>>>> No idea; I can't find it anywhere in the code base either, and the >>>>>>>>> github repo contains history going back to the original CVS repo. >>>>>>>>> You can try contacting the author, possibly. >>>>>>>>> >>>>>>>>>> The reason I am asking is because I have some code to do >>>>>>>>>> silico PCR >>>>>>>>>> using regular expressions. I wanted to modularize my code more >>>>>>>>>> and >>>>>>>>>> make it into a module for Bioperl. Of course, if there is >>>>>>>>>> something >>>>>>>>>> similar in Bioperl already, I need to have a look at it. If there >>>>>>>>>> is nothing similar, what namespace do you suggest to use? >>>>>>>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>>>>>>> Bio::Tools::InSilicoPCR? >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Florent >>>>>>>>> Maybe the last (InSilicoPCR). >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ Bioperl-l mailing >>>>>>>>> list Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fossandonc at hotmail.com Tue Mar 13 14:13:58 2012 From: fossandonc at hotmail.com (=?iso-8859-1?Q?Francisco_J._Ossand=F3n?=) Date: Tue, 13 Mar 2012 15:13:58 -0300 Subject: [Bioperl-l] Bioperl CPAN installation issues In-Reply-To: <3CEFE0DD-8DE4-4029-9F1E-BB50FA9CF185@illinois.edu> References: <3CEFE0DD-8DE4-4029-9F1E-BB50FA9CF185@illinois.edu> Message-ID: This has happened to me since long ago in previous installations, but I had forgot about it until today... Looking around in Google led me to this page: http://mail.pm.org/pipermail/pdx-pm-list/2010-April/005803.html It says: ### MakeMaker (MM) has to use heuristics to find the line where $VERSION is defined for each .pm file and eval that line. In the case of Bio::Ontology::SimpleGOEngine::GraphAdaptor it does not define a $VERSION for itself, which is fine, but it does mention it later on. sub new { my( $class ) = @_; $class = ref $class || $class; my $self= ( defined $Graph::VERSION && $Graph::VERSION >= 0.5 ) ? bless ( {}, $class ) : bless ( {}, 'Bio::Ontology::SimpleGOEngine::GraphAdaptor02' ); $self->{_graph}=new Graph::Directed; $self->{_vertex_attributes}={}; $self->{_edge_attributes}={}; return $self; } MM->parse_version picked up on that, tried to eval it, and it blew up. ### So apparently, the "( defined $Graph::VERSION && $Graph::VERSION >= 0.5 ) ?" line is read when the CPAN Client is trying to find a $VERSION that it can use for upgrade reference, and it gets confused at this point. Since "Bio::Ontology::SimpleGOEngine::GraphAdaptor" depends on "Bio::Root::Root", maybe adding a "use Bio::Root::Version;" to the Root.pm module will let the $VERSION global variable to reach the GraphAdaptor module and all the other modules who depends on it??? I noticed that "RootI.pm" have a Bio::Root::Version dependency but Root.pm don?t have it (at least explicitly). I still need more experience using global variables, but it could be that the reason that Bioperl modules shows "undef" after installation is because CPAN don?t reach $VERSION properly for the all modules of the bundle. Cheers, Francisco J. Ossandon -----Mensaje original----- De: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Fields, Christopher J Enviado el: martes, 13 de marzo de 2012 13:40 Para: Francisco J. Ossand?n CC: Asunto: Re: [Bioperl-l] Bioperl CPAN installation issues That is a bit odd, but it has been reported before. I haven't been able to dedicate any time to tracing it down, so any help is appreciated: https://redmine.open-bio.org/issues/3041 chris On Mar 13, 2012, at 11:21 AM, Francisco J. Ossand?n wrote: > Hello, > > Today I was updating my Perl modules using the CPAN client, through > the "upgrade" command (Im using strawberry perl, not activestate), and > something weird popped up. > > > > I have installed Bioperl version 1.006901, which is the same one in > CPAN, but the client don't recognize the installed Bioperl version and > throws an "undef" version instead, so it reinstall the whole thing again if told so. > Also, while Perl was checking the versions of the installed modules to > compare them to CPAN latest versions, it throws an error with a lot of > code saying that it could not eval "Bio\Ontology\SimpleGOEngine\GraphAdaptor.pm". > > > > Please see the output below and check the "Bio::Align::AlignI" and > "Bio\Ontology\SimpleGOEngine\GraphAdaptor.pm" outputs. > > ##### > > cpan> upgrade > > Database was generated on Mon, 12 Mar 2012 18:06:06 GMT > > > > Package namespace installed latest in CPAN file > > DBD::mysql 4.018 4.020 > CAPTTOFU/DBD-mysql-4.020.tar.gz > > IO::Socket::SSL 1.39 1.59 > SULLR/IO-Socket-SSL-1.59.tar.gz > > Bio::Align::AlignI undef 1.006901 > CJFIELDS/BioPerl-1.6.901.tar.gz > > Could not eval ' > > package ExtUtils::MakeMaker::_version; > > no strict; > > BEGIN { eval { > > # Ensure any version() routine which might have > leaked > > # into this package has been deleted. Interferes > with > > # version->import() > > undef *version; > > require version; > > "version"->import; > > } } > > > > local $Graph::VERSION; > > $Graph::VERSION=undef; > > do { > > ( defined $Graph::VERSION && $Graph::VERSION >= > 0.5 > ) ? > > }; > > $Graph::VERSION; > > ' in > C:\strawberry\perl\site\lib\Bio\Ontology\SimpleGOEngine\GraphAd > > aptor.pm: syntax error at (eval 429) line 17, at EOF > > Could not eval ' > > package ExtUtils::MakeMaker::_version; > > no strict; > > BEGIN { eval { > > # Ensure any version() routine which might have > leaked > > # into this package has been deleted. Interferes > with > > # version->import() > > undef *version; > > require version; > > "version"->import; > > } } > > > > local $Graph::VERSION; > > $Graph::VERSION=undef; > > do { > > ( defined $Graph::VERSION && $Graph::VERSION >= > 0.5 > ) ? > > }; > > $Graph::VERSION; > > ' in > C:\strawberry\perl\site\lib\Bio\Ontology\SimpleGOEngine\GraphAd > > aptor.pm: syntax error at (eval 430) line 17, at EOF > > Error::Simple undef 0 SHLOMIF/Error-0.17017.tar.gz > > ##### > > > > Cheers, > > > > Francisco J. Ossandon > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Mar 13 15:39:07 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 13 Mar 2012 19:39:07 +0000 Subject: [Bioperl-l] Bioperl CPAN installation issues In-Reply-To: References: <3CEFE0DD-8DE4-4029-9F1E-BB50FA9CF185@illinois.edu> Message-ID: <89FF7EFC-DCA4-4889-902F-42AE5CA792BA@illinois.edu> The versioning mechanism for BioPerl is a little screwy, in that it relies on inheriting the VERSION rather than actually defining it per module. We plan on changing this as we split out modules (I'm actually working on this now). My fix for this, and probably the simplest solution? Require a modern version of Graph, and remove the offending lines. Committed in 0930270783. chris On Mar 13, 2012, at 1:13 PM, Francisco J. Ossand?n wrote: > This has happened to me since long ago in previous installations, but I had > forgot about it until today... > > Looking around in Google led me to this page: > http://mail.pm.org/pipermail/pdx-pm-list/2010-April/005803.html > > It says: > ### > MakeMaker (MM) has to use heuristics to find the line where $VERSION is > defined for each .pm file and eval that line. In the case of > Bio::Ontology::SimpleGOEngine::GraphAdaptor it does not define a $VERSION > for > itself, which is fine, but it does mention it later on. > > sub new { > my( $class ) = @_; > $class = ref $class || $class; > > my $self= > ( defined $Graph::VERSION && $Graph::VERSION >= 0.5 ) ? > bless ( {}, $class ) : > bless ( {}, 'Bio::Ontology::SimpleGOEngine::GraphAdaptor02' ); > $self->{_graph}=new Graph::Directed; > $self->{_vertex_attributes}={}; > $self->{_edge_attributes}={}; > return $self; > } > > MM->parse_version picked up on that, tried to eval it, and it blew up. > ### > > So apparently, the "( defined $Graph::VERSION && $Graph::VERSION >= 0.5 ) ?" > line is read when the CPAN Client is trying to find a $VERSION that it can > use for upgrade reference, and it gets confused at this point. > > Since "Bio::Ontology::SimpleGOEngine::GraphAdaptor" depends on > "Bio::Root::Root", maybe adding a "use Bio::Root::Version;" to the Root.pm > module will let the $VERSION global variable to reach the GraphAdaptor > module and all the other modules who depends on it??? I noticed that > "RootI.pm" have a Bio::Root::Version dependency but Root.pm don't have it > (at least explicitly). I still need more experience using global variables, > but it could be that the reason that Bioperl modules shows "undef" after > installation is because CPAN don't reach $VERSION properly for the all > modules of the bundle. > > Cheers, > > Francisco J. Ossandon > > -----Mensaje original----- > De: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Fields, > Christopher J > Enviado el: martes, 13 de marzo de 2012 13:40 > Para: Francisco J. Ossand?n > CC: > Asunto: Re: [Bioperl-l] Bioperl CPAN installation issues > > That is a bit odd, but it has been reported before. I haven't been able to > dedicate any time to tracing it down, so any help is appreciated: > > https://redmine.open-bio.org/issues/3041 > > chris > > On Mar 13, 2012, at 11:21 AM, Francisco J. Ossand?n wrote: > >> Hello, >> >> Today I was updating my Perl modules using the CPAN client, through >> the "upgrade" command (Im using strawberry perl, not activestate), and >> something weird popped up. >> >> >> >> I have installed Bioperl version 1.006901, which is the same one in >> CPAN, but the client don't recognize the installed Bioperl version and >> throws an "undef" version instead, so it reinstall the whole thing again > if told so. >> Also, while Perl was checking the versions of the installed modules to >> compare them to CPAN latest versions, it throws an error with a lot of >> code saying that it could not eval > "Bio\Ontology\SimpleGOEngine\GraphAdaptor.pm". >> >> >> >> Please see the output below and check the "Bio::Align::AlignI" and >> "Bio\Ontology\SimpleGOEngine\GraphAdaptor.pm" outputs. >> >> ##### >> >> cpan> upgrade >> >> Database was generated on Mon, 12 Mar 2012 18:06:06 GMT >> >> >> >> Package namespace installed latest in CPAN file >> >> DBD::mysql 4.018 4.020 >> CAPTTOFU/DBD-mysql-4.020.tar.gz >> >> IO::Socket::SSL 1.39 1.59 >> SULLR/IO-Socket-SSL-1.59.tar.gz >> >> Bio::Align::AlignI undef 1.006901 >> CJFIELDS/BioPerl-1.6.901.tar.gz >> >> Could not eval ' >> >> package ExtUtils::MakeMaker::_version; >> >> no strict; >> >> BEGIN { eval { >> >> # Ensure any version() routine which might have >> leaked >> >> # into this package has been deleted. Interferes >> with >> >> # version->import() >> >> undef *version; >> >> require version; >> >> "version"->import; >> >> } } >> >> >> >> local $Graph::VERSION; >> >> $Graph::VERSION=undef; >> >> do { >> >> ( defined $Graph::VERSION && $Graph::VERSION >= >> 0.5 >> ) ? >> >> }; >> >> $Graph::VERSION; >> >> ' in >> C:\strawberry\perl\site\lib\Bio\Ontology\SimpleGOEngine\GraphAd >> >> aptor.pm: syntax error at (eval 429) line 17, at EOF >> >> Could not eval ' >> >> package ExtUtils::MakeMaker::_version; >> >> no strict; >> >> BEGIN { eval { >> >> # Ensure any version() routine which might have >> leaked >> >> # into this package has been deleted. Interferes >> with >> >> # version->import() >> >> undef *version; >> >> require version; >> >> "version"->import; >> >> } } >> >> >> >> local $Graph::VERSION; >> >> $Graph::VERSION=undef; >> >> do { >> >> ( defined $Graph::VERSION && $Graph::VERSION >= >> 0.5 >> ) ? >> >> }; >> >> $Graph::VERSION; >> >> ' in >> C:\strawberry\perl\site\lib\Bio\Ontology\SimpleGOEngine\GraphAd >> >> aptor.pm: syntax error at (eval 430) line 17, at EOF >> >> Error::Simple undef 0 > SHLOMIF/Error-0.17017.tar.gz >> >> ##### >> >> >> >> Cheers, >> >> >> >> Francisco J. Ossandon >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From florent.angly at gmail.com Wed Mar 14 18:25:55 2012 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 15 Mar 2012 08:25:55 +1000 Subject: [Bioperl-l] Fate of Bio::Tools::PCRSimulation In-Reply-To: <4F5F87D8.9000301@illinois.edu> References: <4F49D4B6.5050301@gmail.com> <4F4E3EEF.5050506@cam.ac.uk> <4325EF60-919F-46EF-91BB-D31160F0B587@illinois.edu> <4F501063.4010109@gmail.com> <96E7CE58-0657-4194-A906-83022348F84A@illinois.edu> <4F55945A.7070901@gmail.com> <4F56ED48.9090500@gmail.com> <4F5F87D8.9000301@illinois.edu> Message-ID: <4F611AF3.8060207@gmail.com> Thanks for the notification Chris! I thought I had commited this file. I must have been daydreaming. It should be all good now. Best, Florent On 14/03/12 03:46, Chris Fields wrote: > Florent, > > Getting a test fail on the amplicon branch, seems to be missing a file. > > The only test consistently failing is t/Tools/AmpliconSearch.t (I'm > seeing NCBI-related test issues that are also occurring on master). > Using 'prove -lr t/Tools/AmpliconSearch.t': > > [cjfields at pyrimidine bioperl-live (amplicons=)]$ prove -lr > t/Tools/AmpliconSearch.t > t/Tools/AmpliconSearch.t .. 1/174 > ------------- EXCEPTION ------------- > MSG: Could not open t/data/forward_primer.fa: No such file or directory > STACK Bio::Root::IO::_initialize_io Bio/Root/IO.pm:351 > STACK Bio::SeqIO::_initialize Bio/SeqIO.pm:477 > STACK Bio::SeqIO::fasta::_initialize Bio/SeqIO/fasta.pm:93 > STACK Bio::SeqIO::new Bio/SeqIO.pm:358 > STACK Bio::SeqIO::new Bio/SeqIO.pm:399 > STACK Bio::Tools::AmpliconSearch::primer_file > Bio/Tools/AmpliconSearch.pm:292 > STACK Bio::Tools::AmpliconSearch::new Bio/Tools/AmpliconSearch.pm:151 > STACK toplevel t/Tools/AmpliconSearch.t:126 > ------------------------------------- > > # Looks like you planned 174 tests but ran 53. > # Looks like your test exited with 2 just after 53. > t/Tools/AmpliconSearch.t .. Dubious, test returned 2 (wstat 512, 0x200) > Failed 121/174 subtests > > Test Summary Report > ------------------- > t/Tools/AmpliconSearch.t (Wstat: 512 Tests: 53 Failed: 0) > Non-zero exit status: 2 > Parse errors: Bad plan. You planned 174 tests but ran 53. > Files=1, Tests=53, 0 wallclock secs ( 0.03 usr 0.01 sys + 0.18 cusr > 0.01 csys = 0.23 CPU) > Result: FAIL > > > chris > > On 03/06/2012 11:08 PM, Florent Angly wrote: >> >> Yes, thanks Chris. If you want to start splitting a Bio-Tools >> distribution that would include Bio::Tools::AmpliconSearch, I am happy >> to help. >> >> In general, I am not specifically attached to the namespace, so if you >> guys prefer something different, just tell me. >> >> Note that AmpliconSearch uses a couple of new or re-worked objects, >> namely Bio::SeqFeature::Primer, Bio::SeqFeature::Amplicon and >> Bio::SeqFeature::SubSeq. Both Primer and Amplicons inherit from SubSeq, >> which inherits from Bio::SeqFeature::Generic. The main goal of SubSeq is >> to allow fetching a subsequence from the attached sequence or to >> explicitly adding a sequence that represents the feature (the Generic >> feature class does not support setting such a sequence, just getting the >> subsequence). I think it would be good to add this functionality to the >> Generic feature class, but I did not want to force things without asking >> everybody first if this seems like a good idea. >> >> Florent >> >> >> On 06/03/12 11:08, Fields, Christopher J wrote: >>> I'll check it out. Want me to post test results here (I have access to >>> a few systems to test on). >>> >>> chris >>> >>> On Mar 5, 2012, at 10:36 PM, Florent Angly wrote: >>> >>>> To all interested, >>>> the AmpliconSearch module is in a decent state. If you want to test >>>> it or improve it, head to >>>> https://github.com/bioperl/bioperl-live/blob/amplicons/Bio/Tools/AmpliconSearch.pm >>>> >>>> >>>> Regards, >>>> Florent >>>> >>>> >>>> On 01/03/12 12:42, Fields, Christopher J wrote: >>>>> Florent, >>>>> >>>>> Just want to add, my previous response isn't meant as an >>>>> admonishment, hope it didn't come across that way, but sometimes >>>>> email makes it hard to discern the difference. I simply meant to >>>>> demonstrate my opinion that I find releasing one's code is much >>>>> simpler (e.g. you can decide the rules and dictate when the code is >>>>> ready for release), and if we can make getting good code into user's >>>>> hands easier, more flexible, and more consistent I think that is >>>>> always a better path. >>>>> >>>>> chris >>>>> >>>>> On Feb 29, 2012, at 8:30 PM, Fields, Christopher J wrote: >>>>> >>>>>> There are a number of very good reasons to separate out common code >>>>>> and create new repos for new code. The problem about adding new >>>>>> code into core is it ties your code development to bioperl-live's >>>>>> release cycle and versioning. Also, what I (and others) would not >>>>>> like to see is any additional dependencies introduced, but a >>>>>> separate release allows you to (1) both add a dependency w/o >>>>>> affecting core, and (2) make it required, so no fiddling with >>>>>> checking for the module prior to running tests on it. >>>>>> >>>>>> As an example, I can easily see something like >>>>>> Bio::SearchIO::blastxml living on it's own since it has a set of >>>>>> outside dependencies. >>>>>> >>>>>> BTW, separation of modules into separate distributions (even single >>>>>> modules) based on functionality above and beyond that defined in a >>>>>> core is very common in the perl world. Beyond the obvious example >>>>>> of anything non-core in perl (all installable via CPAN), Moose, >>>>>> Dist::Zilla, Catalyst, Dancer, etc all have separately installable >>>>>> dists that layer additional functionality and have a separate >>>>>> maintenance path. >>>>>> >>>>>> chris >>>>>> >>>>>> On Mar 1, 2012, at 6:12 PM, Florent Angly wrote: >>>>>> >>>>>>> Thanks for everybody's feedback. >>>>>>> >>>>>>> I am looking at existing modules to hold template sequence, >>>>>>> amplicon sequence and primer information. There is the >>>>>>> Bio::SeqFeature::Primer and Bio::Seq::PrimedSeq. At the moment the >>>>>>> PrimedSeq object places Primer objects on the target sequence. I >>>>>>> have been looking at refreshing these modules (they are quite >>>>>>> old), add some sanity to them and make sure they are suitable for >>>>>>> a generic implementation of PCR (or amplicon search, which I find >>>>>>> a more suitable name since it is a far cry from simulating PCR >>>>>>> cycles, etc). >>>>>>> >>>>>>> I will make a remote branch today to make it easier for interested >>>>>>> parties to experiment and contribute. >>>>>>> >>>>>>> As you can see Chris, the amplicon search feature would use two >>>>>>> existing bioperl-live modules and only add one, tentatively in the >>>>>>> Bio::Tools::AmpliconSearch namespace. I am not convinced that this >>>>>>> warrants a separate distro. >>>>>>> >>>>>>> Florent >>>>>>> >>>>>>> On 01/03/12 01:23, Fields, Christopher J wrote: >>>>>>>> Seems like it was meant to be added at some point but was never >>>>>>>> committed. Definitely not in the github history for 1.3.x, this >>>>>>>> commit corresponds to the v1.3.4 tag: >>>>>>>> >>>>>>>> https://github.com/bioperl/bioperl-live/tree/0a67fa444eb19a70876017607f70ab72be38755a >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> and it's not there. >>>>>>>> >>>>>>>> I agree with Roy, it would be nice to somehow make this a little >>>>>>>> more generic or pluggable on how it maps primers (maybe with a >>>>>>>> default pure perl method). I also think this shouldn't be bound >>>>>>>> to bioperl-live considering our current plans, it would best >>>>>>>> happen in a separate repo. >>>>>>>> >>>>>>>> chris >>>>>>>> >>>>>>>> On Feb 29, 2012, at 9:06 AM, Roy Chaudhuri wrote: >>>>>>>> >>>>>>>>> The code for Bio::Tools::PCRSimulation can be downloaded as part >>>>>>>>> of this archive: >>>>>>>>> http://www.salmonella.org/bioperl/primer3_v0.3.tgz >>>>>>>>> >>>>>>>>> (There's supposedly a more recent version here: >>>>>>>>> http://www.salmonella.org/bioperl/nucleotide_analyses.tgz >>>>>>>>> but that file seems to be truncated). >>>>>>>>> >>>>>>>>> I have no idea how much would be salvagable. It seems to just >>>>>>>>> use index to map the primers to the sequence, I guess it would >>>>>>>>> make more sense to at least give the option of something more >>>>>>>>> sophisticated like Primer3, BLAST or even a short read mapper. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Roy. >>>>>>>>> >>>>>>>>> >>>>>>>>> On 27/02/2012 21:18, Fields, Christopher J wrote: >>>>>>>>>> On Feb 26, 2012, at 12:44 AM, Florent Angly wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I am interested in the Bio::Tools::PCRSimulation module. >>>>>>>>>>> Supposedly >>>>>>>>>>> it was added to Bioperl 0.3 and is also mentionned in the >>>>>>>>>>> Bio::PrimedSeq module. However, I cannot find in the current >>>>>>>>>>> Bioperl codebase. Any idea where it went? >>>>>>>>>> No idea; I can't find it anywhere in the code base either, >>>>>>>>>> and the >>>>>>>>>> github repo contains history going back to the original CVS >>>>>>>>>> repo. >>>>>>>>>> You can try contacting the author, possibly. >>>>>>>>>> >>>>>>>>>>> The reason I am asking is because I have some code to do >>>>>>>>>>> silico PCR >>>>>>>>>>> using regular expressions. I wanted to modularize my code more >>>>>>>>>>> and >>>>>>>>>>> make it into a module for Bioperl. Of course, if there is >>>>>>>>>>> something >>>>>>>>>>> similar in Bioperl already, I need to have a look at it. If >>>>>>>>>>> there >>>>>>>>>>> is nothing similar, what namespace do you suggest to use? >>>>>>>>>>> Bio::Tools::AmpliconExtractor? Bio::Tools::AmpliconSearch? >>>>>>>>>>> Bio::Tools::InSilicoPCR? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Florent >>>>>>>>>> Maybe the last (InSilicoPCR). >>>>>>>>>> >>>>>>>>>> chris >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ Bioperl-l >>>>>>>>>> mailing >>>>>>>>>> list Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From lskatz at gmail.com Tue Mar 13 18:06:48 2012 From: lskatz at gmail.com (Lee Katz) Date: Tue, 13 Mar 2012 18:06:48 -0400 Subject: [Bioperl-l] Bio::SearchIO::Writer::TextResultWriter is buggy Message-ID: Hi, I am separating a blast output file into individual results, so that I can multithread the reading of the results. I cannot pass a result object through Perl threads because it contains code, which is not sharable via threads::share (sharing is used internally in Thread::Queue)--therefore I must pass a sub-file. My strategy is to read the whole file into Bio::SearchIO and then write the result objects to a file, so that a thread can read the file. The thread would thus read one file at a time containing one query and all its results. Reading the original file works, but then outputting the blast file is buggy. The last line of the HSP is empty and has bad coordinates. I have an example, with an error when trying to read it again with SearchIO, and its fasta file below. Any help debugging? Maybe I just need to update BioPerl since I installed it around several months ago, maybe a year ago? Thanks. MSG: In sequence lcl|R009125 residue count gives end value 341. Overriding value [340] with value 341 for Bio::LocatableSeq::end(). ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL-----YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIFSQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN1 --------------------------------------------------- >lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501] Length = 342 Score = 79.3 bits (194), Expect = 2e-15 Identities = 87/360 (24%), Positives = 175/360 (48%), Gaps = 35/360 (9%) Query: 4 GDKTEQASSQKLDKARKQGQIARSKEFSSAIMLMV----CIGYFYANADSLSGHLMQLFE 59 +KTE+ + +KL A K+GQ + K+ ++ ++++V I +F SLS ++ Sbjct: 2 ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL--- 53 Query: 60 VSFRFTAESQSDHDHILHLITQSLYLMIKVFAPLIIF-QFIASAIATCLLGGF------- 111 +R+ + + I + Y FA +I+F + I + C+L Sbjct: 54 --YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQT 100 Query: 112 HFNLSLLAPK--FSKINPLSGIKRIFSKQTLVEFLKNVAKISLIFALLYYMISTNFHMIG 169 F L+ A K FS +NP+ G+K+IFS +T+ EF K++ + ++ Y+ + +I Sbjct: 101 KFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF 160 Query: 170 SLVRASFQTTIHFSLQYVLELLGMLILIAILFGVIDIPYQKMTFGTQMKMTkqevkqehk 229 S V +S + +++ + +IL ++D + + + M M KQE+K+E+ Sbjct: 161 SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYI 220 Query: 230 eqeGRPEIKSRIRQIQMQNARRSASQTVPTADVVLMNPTHFAVALKYDLTKAEAPFVVAK 289 EQEG E KSR R++ ++ + + +V+MNPTH A+ + ++ A APF+ Sbjct: 221 EQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLI 280 Query: 290 GKNEVAFYIRTLAEQHQVEVLVVPEITRSIYHTTQLNQMIPNQLFLAVAQILKYVQQLKS 349 N+ A +R A + + + ++ R +Y T + + V +++ +++Q+++ Sbjct: 281 ETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN 340 Query: 350 349 Sbjct: 341 340 And the whole fasta entry is: >lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501] MANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFFSLSDVMLLYRYVIINDFEINEGKYFFAVVIVFFKI IGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEIL SEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFV DFEHLDEVLRLIVWLEQVENTH -- Lee Katz, Ph.D. From sari.khalil at gmail.com Tue Mar 13 20:47:55 2012 From: sari.khalil at gmail.com (Sari Khaleel) Date: Tue, 13 Mar 2012 20:47:55 -0400 Subject: [Bioperl-l] Fwd: Problem with Bio::Tree::Draw::Cladogram References: <8394E70D-8D8E-4397-A29A-C79AE392EEB4@gmail.com> Message-ID: <47B250DC-0ED0-47A5-B711-1F26C2FA44B8@gmail.com> Hello, My name is Sari Khaleel, I'm a master's student at UD and I've been playing with this module. My problem is that the cladogram shows the node's taxid instead of the node's name, as in the attached picture below. I went through the code and it seems that print() function prints the node's id (its taxid) instead of its name. Is there anyway around that. Here's a simplified version of what I'm trying to do .. it's a simple script that tries to build a tree and print from a list of taxids: my $db = Bio::DB::Taxonomy->new(-source => 'entrez'); # use NCBI Entrez over HTTP my @taxids = qw(296483 398577 269482 331272 331271 266265); my @names = qw (a b c d e f); # Get taxons from entrez my %taxid2taxon; foreach my $taxid (@taxids){ $taxid2taxon{$taxid} = $db->get_taxon(-taxonid => $taxid); } my $tree; my $c =0; foreach my $taxid (@taxids){ my $node = $taxid2taxon{$taxid}; $node->name('supplied', $name[$c]); if (! $tree){ $tree = Bio::Tree::Tree->new(-verbose => $db->verbose, -node => $node); } else{ $tree->merge_lineage($node); } $c++; } # Print the tree as a cladogram my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree => $tree); $obj1->print(-file => "cladogram.eps"); # DONE Sari -------------- next part -------------- A non-text attachment was scrubbed... Name: cladogram.eps Type: image/eps Size: 1419 bytes Desc: not available URL: From lskatz at gmail.com Wed Mar 14 10:35:59 2012 From: lskatz at gmail.com (Lee Katz) Date: Wed, 14 Mar 2012 10:35:59 -0400 Subject: [Bioperl-l] Bio::SearchIO::Writer::TextResultWriter is buggy In-Reply-To: References: Message-ID: I just want to clarify: I have an already existing blast output. Is there a non-buggy way to split it? It is in human-readable text form (-m 0). On Tue, Mar 13, 2012 at 6:06 PM, Lee Katz wrote: > Hi, I am separating a blast output file into individual results, so that I > can multithread the reading of the results. I cannot pass a result object > through Perl threads because it contains code, which is not sharable via > threads::share (sharing is used internally in Thread::Queue)--therefore I > must pass a sub-file. My strategy is to read the whole file into > Bio::SearchIO and then write the result objects to a file, so that a thread > can read the file. The thread would thus read one file at a time > containing one query and all its results. > > Reading the original file works, but then outputting the blast file is > buggy. The last line of the HSP is empty and has bad coordinates. I have > an example, with an error when trying to read it again with SearchIO, and > its fasta file below. > > Any help debugging? Maybe I just need to update BioPerl since I installed > it around several months ago, maybe a year ago? Thanks. > > > MSG: In sequence lcl|R009125 residue count gives end value 341. > Overriding value [340] with value 341 for Bio::LocatableSeq::end(). > > ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL-----YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIFSQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN1 > --------------------------------------------------- > > > > >lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion > protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501] > Length = 342 > > Score = 79.3 bits (194), Expect = 2e-15 > Identities = 87/360 (24%), Positives = 175/360 (48%), Gaps = 35/360 (9%) > > > Query: 4 GDKTEQASSQKLDKARKQGQIARSKEFSSAIMLMV----CIGYFYANADSLSGHLMQLFE 59 > +KTE+ + +KL A K+GQ + K+ ++ ++++V I +F SLS ++ > Sbjct: 2 ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL--- 53 > > Query: 60 VSFRFTAESQSDHDHILHLITQSLYLMIKVFAPLIIF-QFIASAIATCLLGGF------- 111 > +R+ + + I + Y FA +I+F + I + C+L > Sbjct: 54 --YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQT 100 > > Query: 112 HFNLSLLAPK--FSKINPLSGIKRIFSKQTLVEFLKNVAKISLIFALLYYMISTNFHMIG 169 > F L+ A K FS +NP+ G+K+IFS +T+ EF K++ + ++ Y+ + +I > Sbjct: 101 KFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF 160 > > Query: 170 SLVRASFQTTIHFSLQYVLELLGMLILIAILFGVIDIPYQKMTFGTQMKMTkqevkqehk 229 > S V +S + +++ + +IL ++D + + + M M KQE+K+E+ > Sbjct: 161 SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYI 220 > > Query: 230 eqeGRPEIKSRIRQIQMQNARRSASQTVPTADVVLMNPTHFAVALKYDLTKAEAPFVVAK 289 > EQEG E KSR R++ ++ + + +V+MNPTH A+ + ++ A APF+ > Sbjct: 221 EQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLI 280 > > Query: 290 GKNEVAFYIRTLAEQHQVEVLVVPEITRSIYHTTQLNQMIPNQLFLAVAQILKYVQQLKS 349 > N+ A +R A + + + ++ R +Y T + + V +++ +++Q+++ > Sbjct: 281 ETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN 340 > > Query: 350 349 > > Sbjct: 341 340 > > And the whole fasta entry is: > >lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion > protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501] > > MANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFFSLSDVMLLYRYVIINDFEINEGKYFFAVVIVFFKI > > IGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF > > SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEIL > > SEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFV > DFEHLDEVLRLIVWLEQVENTH > > > > -- > Lee Katz, Ph.D. > -- Lee Katz, Ph.D. From roy.chaudhuri at gmail.com Wed Mar 14 11:18:48 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 14 Mar 2012 15:18:48 +0000 Subject: [Bioperl-l] Fwd: Problem with Bio::Tree::Draw::Cladogram In-Reply-To: <47B250DC-0ED0-47A5-B711-1F26C2FA44B8@gmail.com> References: <8394E70D-8D8E-4397-A29A-C79AE392EEB4@gmail.com> <47B250DC-0ED0-47A5-B711-1F26C2FA44B8@gmail.com> Message-ID: <4F60B6D8.1050009@gmail.com> Hi Sari, I think the problem is that Bio::Taxon considers id to be the taxon id, but it inherits from Bio::Tree::NodeI, which defines id as the human-readable name. Since you are already adding names, as a workaround you should be able to also add them to the id slot using: $node->id($name[$c]); They should then print out with your tree. Cheers, Roy. On 14/03/2012 00:47, Sari Khaleel wrote: > Hello, My name is Sari Khaleel, I'm a master's student at UD and I've > been playing with this module. My problem is that the cladogram shows > the node's taxid instead of the node's name, as in the attached > picture below. I went through the code and it seems that print() > function prints the node's id (its taxid) instead of its name. Is > there anyway around that. > > Here's a simplified version of what I'm trying to do .. it's a simple > script that tries to build a tree and print from a list of taxids: > > my $db = Bio::DB::Taxonomy->new(-source => 'entrez'); # use NCBI Entrez over HTTP > my @taxids = qw(296483 398577 269482 331272 331271 266265); > my @names = qw (a b c d e f); > > # Get taxons from entrez > my %taxid2taxon; > foreach my $taxid (@taxids){ > $taxid2taxon{$taxid} = $db->get_taxon(-taxonid => $taxid); > } > > my $tree; my $c =0; > foreach my $taxid (@taxids){ > my $node = $taxid2taxon{$taxid}; > $node->name('supplied', $name[$c]); > > if (! $tree){ > $tree = Bio::Tree::Tree->new(-verbose => $db->verbose, -node => $node); > } > else{ > $tree->merge_lineage($node); > } > $c++; > } > > > # Print the tree as a cladogram > my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree => $tree); > $obj1->print(-file => "cladogram.eps"); > > # DONE > > > Sari > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Mar 14 11:21:40 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 14 Mar 2012 15:21:40 +0000 Subject: [Bioperl-l] Bio::SearchIO::Writer::TextResultWriter is buggy In-Reply-To: References: Message-ID: Can you pass a file handle? SearchIO accepts them using '-fh', but I have no idea if this would work. See: http://bytes.com/topic/perl/answers/497453-howto-share-filehandles-between-threads The problem is whether this will be at the specific file point you need or whether the file pointer will be reset to the beginning. According to the last answer there, if the variable is sharing scope it should work, but I haven't tried it out myself to be honest. If you try that out let us know, I would be interested to see if it works. Re: splitting the file, this is a little tricky as plain text output has changed. Older versions of BLAST simply concatenate output files, so the header for the file is repeated (lines up to 'Query'). Latter version leave off the repeated header. You could simply split the file up based on the query using a regex, I believe the result object should still be generated, but the header contains information that you may want, such as BLAST version, etc. chris On Mar 14, 2012, at 9:35 AM, Lee Katz wrote: > I just want to clarify: I have an already existing blast output. Is there > a non-buggy way to split it? It is in human-readable text form (-m 0). > > On Tue, Mar 13, 2012 at 6:06 PM, Lee Katz wrote: > >> Hi, I am separating a blast output file into individual results, so that I >> can multithread the reading of the results. I cannot pass a result object >> through Perl threads because it contains code, which is not sharable via >> threads::share (sharing is used internally in Thread::Queue)--therefore I >> must pass a sub-file. My strategy is to read the whole file into >> Bio::SearchIO and then write the result objects to a file, so that a thread >> can read the file. The thread would thus read one file at a time >> containing one query and all its results. >> >> Reading the original file works, but then outputting the blast file is >> buggy. The last line of the HSP is empty and has bad coordinates. I have >> an example, with an error when trying to read it again with SearchIO, and >> its fasta file below. >> >> Any help debugging? Maybe I just need to update BioPerl since I installed >> it around several months ago, maybe a year ago? Thanks. >> >> >> MSG: In sequence lcl|R009125 residue count gives end value 341. >> Overriding value [340] with value 341 for Bio::LocatableSeq::end(). >> >> ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL-----YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIFSQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN1 >> --------------------------------------------------- >> >> >> >>> lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion >> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501] >> Length = 342 >> >> Score = 79.3 bits (194), Expect = 2e-15 >> Identities = 87/360 (24%), Positives = 175/360 (48%), Gaps = 35/360 (9%) >> >> >> Query: 4 GDKTEQASSQKLDKARKQGQIARSKEFSSAIMLMV----CIGYFYANADSLSGHLMQLFE 59 >> +KTE+ + +KL A K+GQ + K+ ++ ++++V I +F SLS ++ >> Sbjct: 2 ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL--- 53 >> >> Query: 60 VSFRFTAESQSDHDHILHLITQSLYLMIKVFAPLIIF-QFIASAIATCLLGGF------- 111 >> +R+ + + I + Y FA +I+F + I + C+L >> Sbjct: 54 --YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQT 100 >> >> Query: 112 HFNLSLLAPK--FSKINPLSGIKRIFSKQTLVEFLKNVAKISLIFALLYYMISTNFHMIG 169 >> F L+ A K FS +NP+ G+K+IFS +T+ EF K++ + ++ Y+ + +I >> Sbjct: 101 KFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF 160 >> >> Query: 170 SLVRASFQTTIHFSLQYVLELLGMLILIAILFGVIDIPYQKMTFGTQMKMTkqevkqehk 229 >> S V +S + +++ + +IL ++D + + + M M KQE+K+E+ >> Sbjct: 161 SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYI 220 >> >> Query: 230 eqeGRPEIKSRIRQIQMQNARRSASQTVPTADVVLMNPTHFAVALKYDLTKAEAPFVVAK 289 >> EQEG E KSR R++ ++ + + +V+MNPTH A+ + ++ A APF+ >> Sbjct: 221 EQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLI 280 >> >> Query: 290 GKNEVAFYIRTLAEQHQVEVLVVPEITRSIYHTTQLNQMIPNQLFLAVAQILKYVQQLKS 349 >> N+ A +R A + + + ++ R +Y T + + V +++ +++Q+++ >> Sbjct: 281 ETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN 340 >> >> Query: 350 349 >> >> Sbjct: 341 340 >> >> And the whole fasta entry is: >>> lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion >> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501] >> >> MANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFFSLSDVMLLYRYVIINDFEINEGKYFFAVVIVFFKI >> >> IGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF >> >> SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEIL >> >> SEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFV >> DFEHLDEVLRLIVWLEQVENTH >> >> >> >> -- >> Lee Katz, Ph.D. >> > > > > -- > Lee Katz, Ph.D. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From lskatz at gmail.com Wed Mar 14 12:51:03 2012 From: lskatz at gmail.com (Lee Katz) Date: Wed, 14 Mar 2012 12:51:03 -0400 Subject: [Bioperl-l] Bio::SearchIO::Writer::TextResultWriter is buggy In-Reply-To: References: Message-ID: I haven't gotten too far, but the following code might be appropriate for splitting a human-readable blast. The output files should be appropriate for passing to threads. I'll give an update tomorrow when I have more time. logmsg "Splitting the blast output into chunks"; system("csplit -s -f '$$settings{tempdir}/xx' -b '%02d.bls' $$settings{blastfile} /Query=/ '{*}'"); die "Problem with splitting the blast output into chunks: $!" if $?; # remove xx00 from the chunk array and add it to each chunk for my $chunk(glob("$$settings{tempdir}/xx*.bls")){ next if($chunk=~/xx00.bls/); my $newChunk=`cat $$settings{tempdir}/xx00.bls $chunk`; system("cat $$settings{tempdir}/xx00.bls $chunk > $$settings{tempdir}/chunktmp.bls"); system("cp $$settings{tempdir}/chunktmp.bls $chunk"); die "Error moving temp blast file" if $?; } On Wed, Mar 14, 2012 at 11:21 AM, Fields, Christopher J < cjfields at illinois.edu> wrote: > Can you pass a file handle? SearchIO accepts them using '-fh', but I have > no idea if this would work. See: > > > http://bytes.com/topic/perl/answers/497453-howto-share-filehandles-between-threads > > The problem is whether this will be at the specific file point you need or > whether the file pointer will be reset to the beginning. According to the > last answer there, if the variable is sharing scope it should work, but I > haven't tried it out myself to be honest. If you try that out let us know, > I would be interested to see if it works. > > Re: splitting the file, this is a little tricky as plain text output has > changed. Older versions of BLAST simply concatenate output files, so the > header for the file is repeated (lines up to 'Query'). Latter version > leave off the repeated header. You could simply split the file up based on > the query using a regex, I believe the result object should still be > generated, but the header contains information that you may want, such as > BLAST version, etc. > > chris > > On Mar 14, 2012, at 9:35 AM, Lee Katz wrote: > > > I just want to clarify: I have an already existing blast output. Is > there > > a non-buggy way to split it? It is in human-readable text form (-m 0). > > > > On Tue, Mar 13, 2012 at 6:06 PM, Lee Katz wrote: > > > >> Hi, I am separating a blast output file into individual results, so > that I > >> can multithread the reading of the results. I cannot pass a result > object > >> through Perl threads because it contains code, which is not sharable via > >> threads::share (sharing is used internally in Thread::Queue)--therefore > I > >> must pass a sub-file. My strategy is to read the whole file into > >> Bio::SearchIO and then write the result objects to a file, so that a > thread > >> can read the file. The thread would thus read one file at a time > >> containing one query and all its results. > >> > >> Reading the original file works, but then outputting the blast file is > >> buggy. The last line of the HSP is empty and has bad coordinates. I > have > >> an example, with an error when trying to read it again with SearchIO, > and > >> its fasta file below. > >> > >> Any help debugging? Maybe I just need to update BioPerl since I > installed > >> it around several months ago, maybe a year ago? Thanks. > >> > >> > >> MSG: In sequence lcl|R009125 residue count gives end value 341. > >> Overriding value [340] with value 341 for Bio::LocatableSeq::end(). > >> > >> > ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL-----YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIFSQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN1 > >> --------------------------------------------------- > >> > >> > >> > >>> lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion > >> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501] > >> Length = 342 > >> > >> Score = 79.3 bits (194), Expect = 2e-15 > >> Identities = 87/360 (24%), Positives = 175/360 (48%), Gaps = 35/360 (9%) > >> > >> > >> Query: 4 GDKTEQASSQKLDKARKQGQIARSKEFSSAIMLMV----CIGYFYANADSLSGHLMQLFE > 59 > >> +KTE+ + +KL A K+GQ + K+ ++ ++++V I +F SLS ++ > >> Sbjct: 2 ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL--- > 53 > >> > >> Query: 60 VSFRFTAESQSDHDHILHLITQSLYLMIKVFAPLIIF-QFIASAIATCLLGGF------- > 111 > >> +R+ + + I + Y FA +I+F + I + C+L > >> Sbjct: 54 --YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQT > 100 > >> > >> Query: 112 HFNLSLLAPK--FSKINPLSGIKRIFSKQTLVEFLKNVAKISLIFALLYYMISTNFHMIG > 169 > >> F L+ A K FS +NP+ G+K+IFS +T+ EF K++ + ++ Y+ + +I > >> Sbjct: 101 KFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF > 160 > >> > >> Query: 170 SLVRASFQTTIHFSLQYVLELLGMLILIAILFGVIDIPYQKMTFGTQMKMTkqevkqehk > 229 > >> S V +S + +++ + +IL ++D + + + M M KQE+K+E+ > >> Sbjct: 161 SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYI > 220 > >> > >> Query: 230 eqeGRPEIKSRIRQIQMQNARRSASQTVPTADVVLMNPTHFAVALKYDLTKAEAPFVVAK > 289 > >> EQEG E KSR R++ ++ + + +V+MNPTH A+ + ++ A APF+ > >> Sbjct: 221 EQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLI > 280 > >> > >> Query: 290 GKNEVAFYIRTLAEQHQVEVLVVPEITRSIYHTTQLNQMIPNQLFLAVAQILKYVQQLKS > 349 > >> N+ A +R A + + + ++ R +Y T + + V +++ +++Q+++ > >> Sbjct: 281 ETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN > 340 > >> > >> Query: 350 349 > >> > >> Sbjct: 341 340 > >> > >> And the whole fasta entry is: > >>> lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion > >> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501] > >> > >> > MANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFFSLSDVMLLYRYVIINDFEINEGKYFFAVVIVFFKI > >> > >> > IGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF > >> > >> > SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEIL > >> > >> > SEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFV > >> DFEHLDEVLRLIVWLEQVENTH > >> > >> > >> > >> -- > >> Lee Katz, Ph.D. > >> > > > > > > > > -- > > Lee Katz, Ph.D. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- Lee Katz, Ph.D. From bsdsig2012 at gmail.com Thu Mar 15 10:18:36 2012 From: bsdsig2012 at gmail.com (BSD SIG) Date: Thu, 15 Mar 2012 10:18:36 -0400 Subject: [Bioperl-l] [Call for Submissions] Biological Systems Design 2012 - BSD-SIG @ ISMB 2012 - July 13, 2012, Long Beach, CA Message-ID: --------------------------------------------------------------------------------------------------- Apologies for cross-posting --------------------------------------------------------------------------------------------------- CALL FOR SUBMISSIONS SPECIAL INTEREST GROUP ON BIOLOGICAL SYSTEMS DESIGN (BSD-SIG 2012) Intelligent Systems for Molecular Biology (ISMB 2012) July 13, 2012 - Long Beach, CA USA Homepage: http://bsd2012.bme.jhu.edu/ Submission site: https://www.easychair.org/conferences/?conf=bsdsig2012 Email: bsd2012 at jhu.edu --------------------------------------------------------------------------------------------------- Important Dates: Paper submission: April 20, 2012 Author notification: May 4, 2012 Workshop: July 13, 2012 --------------------------------------------------------------------------------------------------- **General Information** The complexity of the genomic structure and our limited understanding of biological processes require new computational methods to investigate the huge number of possible designs for circuits, pathways, and entire genomes, with the ideal being the ability to model, simulate and redesign a biological system in-silicon prior to fabrication, similar to CAD/CAM for physical devices. Synthetic Biology aims to establish a standard and effective biological design flow, where biological systems are designed and verified computationally, before in vitro synthesis and in vivo experiments. Each phase of this process has multiple challenges ranging from managing high-throughput laboratory operations to developing new software and defining accurate and interoperable computational models. The Special Interest Group in Biological Systems Design (BSD-SIG 2012) aims to provide a broad view of the current state-of-the-art for scientists from biology, chemistry, computer science, mathematics and engineering. --------------------------------------------------------------------------------------------------- **Keynote Speakers** - Jef D. Boeke, Johns Hopkins University - Christodoulos A. Floudas, Princeton - Dan Gusfield, UC Davis - Nathan J. Hillson, Joint BioEnergy Institute **Invited Speakers** - Jake Beal, BBN Technologies - Michal Galdzicki, University of Washington School of Medicine - Sarah Richardson, Joint Genome Institute --------------------------------------------------------------------------------------------------- **Sessions** BSD-SIG is structured in four sessions: * Genome Design * Protein Design * Computer Aided Design Tools * Data management & standards **Genome Design** The availability of high-fidelity techniques for the synthesis of long DNA strands constitutes the starting point for effective pathway engineering. The aim of this session is to present state-of-the-art methods for genome design, focusing, but not limited to, on the following topics: oligo-nucleotides design, Probe and watermark design, High-throughput techniques, theoretical aspects of DNA design. **Protein Design** An important purpose of synthetic DNA is to express non-native or human-designed proteins. Protein expression and design introduce additional complexities. This session provides a forum to discuss the recent advances in this field, with particular emphasis on the design of therapeutic peptides and proteins. **Computer Aided Design Tools** The design of biological systems is often characterized by ad hoc, human-centric procedures, which limit applications to small-scale problems. While Computer-Aided-Design (CAD) tools are standard in many engineering fields, CAD capabilities for synthetic biology are at a very early stage. This session gives a broad view of some emerging approaches in Biological Design Automation (BDA), with the aim of finding and discussing new areas where CAD tools can improve and accelerate the synthesis of living matter. **Data management & standards** The enormous amount of data generated by high-throughput techniques and synthesis processes requires the introduction of new and specific representation schemes, along with efficient and open standards for interfacing different data sources. New systems are also required to collect information and performing on-line data analysis. The aim of this session is twofold: first, exploring data structures and representations for synthetic biology; second, promoting and discussing use-case scenarios for the Synthetic Biology Open Language (SBOL). --------------------------------------------------------------------------------------------------- **Submissions** We encourage submissions in the form of oral and poster presentation; authors must submit a 1 page abstract specifying the track and the form of the contribution. Two blind reviewers will review each submission, and suggest the most appropriate form for presentation. All the accepted abstracts will be published on the hands-out materials of BSD-SIG. Contributions must be submitted through EasyChair: https://www.easychair.org/conferences/?conf=bsdsig2012 --------------------------------------------------------------------------------------------------- We are looking forward to see you in Long Beach: Joel Bader, Doug Densmore, Swapnil Bathia and Giovanni Stracquadanio From jovel_juan at hotmail.com Thu Mar 15 11:13:03 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Thu, 15 Mar 2012 15:13:03 +0000 Subject: [Bioperl-l] Problem when retrieving protein DESCRIPTION with Bio::DB::GenBank In-Reply-To: References: , , , Message-ID: Dear All, I have done assemblies and blastx on few dozens of libraries and then extracted the tabular report from Blast. As you know such table provides the 'gi' for each protein, but not the description. I have written the small script (shown below) to retrieve the Description of each hit, and append it to the tabular blast report. >From typical 'gi's (gi|13022215|V|gb|AAK11700.1|AF345523_1), I extract the 'gi|13022215|' and use it to feed Bio::DB::GenBank. It works in most instances, however, it collapses when a gi, like gi|6178084|, matches more than one entry. I have also tried using the 'version' (|AF345523_1) but similar problems are encountered in some instances. I can guess there are easier/more efficient solutions to this simple task, but unfortunately it is as far as my BioPerl skills go. Any help will be appreciated. The referred script follows: #!/usr/bin/perl -w use Bio::DB::GenBank; chomp($dir = $ARGV[0]);opendir(DIR, $dir) or die "$!";@files = grep {/\.blast$/} readdir DIR;close DIR;@files = sort(@files); foreach $file(@files){ open(IN, "$dir/$file") or die "$!"; print "$file \n"; ($out = $file) =~ s/\.blast$/\.xls/; open(OUT, ">$dir/$out"); while($line = ){ chomp($line); @temp = split(/\t/, $line); $left = index($temp[1], 'gi|'); $right = index($temp[1], '|V|'); $id = substr($temp[1], $left + 2, ($right - $left) - 1); my $db_obj = Bio::DB::GenBank->new; my $seq_obj = $db_obj->get_Seq_by_acc($id); print OUT $line, "\t", $seq_obj->desc, "\n"; print $seq_obj->desc, "\n"; }close IN;close OUT;}exit; From lskatz at gmail.com Thu Mar 15 17:26:30 2012 From: lskatz at gmail.com (Lee Katz) Date: Thu, 15 Mar 2012 17:26:30 -0400 Subject: [Bioperl-l] Bio::SearchIO::Writer::TextResultWriter is buggy In-Reply-To: References: Message-ID: Splitting the blast output file, enquing the file, and then adding the header works! In the Main thread: logmsg "Splitting the blast output into chunks"; system("csplit -s -f '$$settings{tempdir}/xx' -b '%02d.bls' $$settings{blastfile} /Query=/ '{*}'"); die "Problem with splitting the blast output into chunks: $!" if $?; # remove xx00 from the chunk array and add it to each chunk $$settings{blastHeader}=`cat $$settings{tempdir}/xx00.bls`; die "Could not get blast header from file because $!" if $?; unlink("$$settings{tempdir}/xx00.bls"); # remove this file before it's caught by glob() my @chunk=glob("$$settings{tempdir}/xx*.bls"); In the threads: while(defined(my $resultFile=$Q->dequeue)){ my $completeBlastContents=`(echo '$$settings{blastHeader}' && cat $resultFile)`; open(BLASTIN, "<", \$completeBlastContents) or die "Could not open string for reading: $!"; my $searchIn=Bio::SearchIO->new(-fh=>\*BLASTIN,-format=>"blast"); On Wed, Mar 14, 2012 at 12:51 PM, Lee Katz wrote: > I haven't gotten too far, but the following code might be appropriate for > splitting a human-readable blast. The output files should be appropriate > for passing to threads. I'll give an update tomorrow when I have more time. > > logmsg "Splitting the blast output into chunks"; > system("csplit -s -f '$$settings{tempdir}/xx' -b '%02d.bls' > $$settings{blastfile} /Query=/ '{*}'"); > die "Problem with splitting the blast output into chunks: $!" if $?; > # remove xx00 from the chunk array and add it to each chunk > for my $chunk(glob("$$settings{tempdir}/xx*.bls")){ > next if($chunk=~/xx00.bls/); > my $newChunk=`cat $$settings{tempdir}/xx00.bls $chunk`; > system("cat $$settings{tempdir}/xx00.bls $chunk > > $$settings{tempdir}/chunktmp.bls"); > system("cp $$settings{tempdir}/chunktmp.bls $chunk"); > die "Error moving temp blast file" if $?; > } > > > > On Wed, Mar 14, 2012 at 11:21 AM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > >> Can you pass a file handle? SearchIO accepts them using '-fh', but I >> have no idea if this would work. See: >> >> >> http://bytes.com/topic/perl/answers/497453-howto-share-filehandles-between-threads >> >> The problem is whether this will be at the specific file point you need >> or whether the file pointer will be reset to the beginning. According to >> the last answer there, if the variable is sharing scope it should work, but >> I haven't tried it out myself to be honest. If you try that out let us >> know, I would be interested to see if it works. >> >> Re: splitting the file, this is a little tricky as plain text output has >> changed. Older versions of BLAST simply concatenate output files, so the >> header for the file is repeated (lines up to 'Query'). Latter version >> leave off the repeated header. You could simply split the file up based on >> the query using a regex, I believe the result object should still be >> generated, but the header contains information that you may want, such as >> BLAST version, etc. >> >> chris >> >> On Mar 14, 2012, at 9:35 AM, Lee Katz wrote: >> >> > I just want to clarify: I have an already existing blast output. Is >> there >> > a non-buggy way to split it? It is in human-readable text form (-m 0). >> > >> > On Tue, Mar 13, 2012 at 6:06 PM, Lee Katz wrote: >> > >> >> Hi, I am separating a blast output file into individual results, so >> that I >> >> can multithread the reading of the results. I cannot pass a result >> object >> >> through Perl threads because it contains code, which is not sharable >> via >> >> threads::share (sharing is used internally in >> Thread::Queue)--therefore I >> >> must pass a sub-file. My strategy is to read the whole file into >> >> Bio::SearchIO and then write the result objects to a file, so that a >> thread >> >> can read the file. The thread would thus read one file at a time >> >> containing one query and all its results. >> >> >> >> Reading the original file works, but then outputting the blast file is >> >> buggy. The last line of the HSP is empty and has bad coordinates. I >> have >> >> an example, with an error when trying to read it again with SearchIO, >> and >> >> its fasta file below. >> >> >> >> Any help debugging? Maybe I just need to update BioPerl since I >> installed >> >> it around several months ago, maybe a year ago? Thanks. >> >> >> >> >> >> MSG: In sequence lcl|R009125 residue count gives end value 341. >> >> Overriding value [340] with value 341 for Bio::LocatableSeq::end(). >> >> >> >> >> ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL-----YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIFSQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN1 >> >> --------------------------------------------------- >> >> >> >> >> >> >> >>> lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion >> >> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501] >> >> Length = 342 >> >> >> >> Score = 79.3 bits (194), Expect = 2e-15 >> >> Identities = 87/360 (24%), Positives = 175/360 (48%), Gaps = 35/360 >> (9%) >> >> >> >> >> >> Query: 4 >> GDKTEQASSQKLDKARKQGQIARSKEFSSAIMLMV----CIGYFYANADSLSGHLMQLFE 59 >> >> +KTE+ + +KL A K+GQ + K+ ++ ++++V I +F SLS ++ >> >> Sbjct: 2 >> ANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFF-----SLSDVMLL--- 53 >> >> >> >> Query: 60 >> VSFRFTAESQSDHDHILHLITQSLYLMIKVFAPLIIF-QFIASAIATCLLGGF------- 111 >> >> +R+ + + I + Y FA +I+F + I + C+L >> >> Sbjct: 54 >> --YRYVIINDFE-------INEGKYF----FAVVIVFFKIIGFPLFFCVLSAVLPTLVQT 100 >> >> >> >> Query: 112 >> HFNLSLLAPK--FSKINPLSGIKRIFSKQTLVEFLKNVAKISLIFALLYYMISTNFHMIG 169 >> >> F L+ A K FS +NP+ G+K+IFS +T+ EF K++ + ++ Y+ + +I >> >> Sbjct: 101 >> KFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF 160 >> >> >> >> Query: 170 >> SLVRASFQTTIHFSLQYVLELLGMLILIAILFGVIDIPYQKMTFGTQMKMTkqevkqehk 229 >> >> S V +S + +++ + +IL ++D + + + M M KQE+K+E+ >> >> Sbjct: 161 >> SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYI 220 >> >> >> >> Query: 230 >> eqeGRPEIKSRIRQIQMQNARRSASQTVPTADVVLMNPTHFAVALKYDLTKAEAPFVVAK 289 >> >> EQEG E KSR R++ ++ + + +V+MNPTH A+ + ++ A APF+ >> >> Sbjct: 221 >> EQEGHFETKSRRRELHIEILSEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLI 280 >> >> >> >> Query: 290 >> GKNEVAFYIRTLAEQHQVEVLVVPEITRSIYHTTQLNQMIPNQLFLAVAQILKYVQQLKS 349 >> >> N+ A +R A + + + ++ R +Y T + + V +++ +++Q+++ >> >> Sbjct: 281 >> ETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFVDFEHLDEVLRLIVWLEQVEN 340 >> >> >> >> Query: 350 349 >> >> >> >> Sbjct: 341 340 >> >> >> >> And the whole fasta entry is: >> >>> lcl|R009125 (gi:13449103) spa40 (pWR501_p164) - Type III secretion >> >> protein [Shigella flexneri str. M90T (serotype 5a) plasmid pWR501] >> >> >> >> >> MANKTEKPTPKKLKDAAKKGQSFKFKDLTTVVIILVGTFTIISFFSLSDVMLLYRYVIINDFEINEGKYFFAVVIVFFKI >> >> >> >> >> IGFPLFFCVLSAVLPTLVQTKFVLATKAIKIDFSVLNPVKGLKKIFSIKTIKEFFKSILLLIILALTTYFFWINDRKIIF >> >> >> >> >> SQVFSSVDGLYLIWGRLFKDIILFFLAFSILVIILDFVIEFILYMKDMMMDKQEIKREYIEQEGHFETKSRRRELHIEIL >> >> >> >> >> SEQTKSDIRNSKLVVMNPTHIAIGIYFNPEIAPAPFISLIETNQCALAVRKYANEVGIPTVRDVKLARKLYKTHTKYSFV >> >> DFEHLDEVLRLIVWLEQVENTH >> >> >> >> >> >> >> >> -- >> >> Lee Katz, Ph.D. >> >> >> > >> > >> > >> > -- >> > Lee Katz, Ph.D. >> > >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > Lee Katz, Ph.D. > -- Lee Katz, Ph.D. From dumps at gmx.de Fri Mar 16 13:56:53 2012 From: dumps at gmx.de (dumps) Date: Fri, 16 Mar 2012 10:56:53 -0700 (PDT) Subject: [Bioperl-l] Failed Test Installing BioPerl-1.6.901 In-Reply-To: References: Message-ID: <33518864.post@talk.nabble.com> Hi, I've experienced the same problem as Leon, yet on a SUSE 12.1 system "./Build test --test-files t/SeqIO/SeqIO.t --verbose" produced the following log t/SeqIO/SeqIO.t .. 1..45 ok 1 - use Bio::SeqIO; ok 2 ok 3 - ID for format gcg ok 4 ok 5 ok 6 ok 7 ok 8 - ID for format fasta ok 9 ok 10 ok 11 ok 12 - accession.version ok 13 ok 14 ok 15 ok 16 ok 17 ok 18 ok 19 - ID for format pir ok 20 ok 21 ok 22 ok 23 ok 24 ok 25 - ID for format tab ok 26 ok 27 ok 28 ok 29 ok 30 ok 31 - ID for format ace ok 32 ok 33 ok 34 ok 35 ok 36 - use Algorithm::Diff; ok 37 - use IO::ScalarArray; ok 38 - use IO::String; ok 39 ok 40 ok 41 not ok 42 - Must pass a file or file handle # TODO file/fh-based tests should be in Bio::Root::IO, see issue #3204 # Failed (TODO) test 'Must pass a file or file handle' # at t/SeqIO/SeqIO.t line 120. # expecting: Regexp ((?-xism:No file, fh, or string argument provided)) # found: # ------------- EXCEPTION ------------- # MSG: Could not guess format from file/fh # STACK Bio::SeqIO::new Bio/SeqIO.pm:389 # STACK Test::Exception::throws_ok t/SeqIO/SeqIO.t:119 # STACK toplevel t/SeqIO/SeqIO.t:120 # ------------------------------------- ok 43 - Must pass a file or file handle ok 44 - Must pass a file or file handle not ok 45 - Must pass a real file # Failed test 'Must pass a real file' # at t/SeqIO/SeqIO.t line 135. # expecting: Regexp ((?-xism:Can not open 'foo.bar' for reading: No such file or directory)) # found: # ------------- EXCEPTION ------------- # MSG: Can not open 'foo.bar' for reading: Datei oder Verzeichnis nicht gefunden # STACK Bio::Tools::GuessSeqFormat::guess Bio/Tools/GuessSeqFormat.pm:462 # STACK Bio::SeqIO::new Bio/SeqIO.pm:381 # STACK Test::Exception::throws_ok t/SeqIO/SeqIO.t:134 # STACK toplevel t/SeqIO/SeqIO.t:135 # ------------------------------------- # Looks like you failed 1 test of 45. Dubious, test returned 1 (wstat 256, 0x100) Failed 1/45 subtests Test Summary Report ------------------- t/SeqIO/SeqIO.t (Wstat: 256 Tests: 45 Failed: 1) Failed test: 45 Non-zero exit status: 1 Files=1, Tests=45, 2 wallclock secs ( 0.05 usr 0.04 sys + 0.46 cusr 0.12 csys = 0.67 CPU) Result: FAIL Failed 1/1 test programs. 1/45 subtests failed. Thanks Thomas Fields, Christopher J wrote: > > This unfortunately doesn't give us much to work on. Can you give the > verbose test output? Something like: > > prove -lrv t/SeqIO/SeqIO.t > > or > > ./Build test --test-files t/SeqIO/SeqIO.t --verbose > > chris > > -- View this message in context: http://old.nabble.com/Failed-Test-Installing-BioPerl-1.6.901-tp33487344p33518864.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Fri Mar 16 15:24:21 2012 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 16 Mar 2012 14:24:21 -0500 Subject: [Bioperl-l] Failed Test Installing BioPerl-1.6.901 In-Reply-To: <33518864.post@talk.nabble.com> References: <33518864.post@talk.nabble.com> Message-ID: <4F639365.4070607@illinois.edu> The last test fail is easy enough (lang difference, so bad test) but the first fail is a bit odd since it passes for almost everything. I'll see if I can dig anything up. chris On 03/16/2012 12:56 PM, dumps wrote: > > Hi, > > I've experienced the same problem as Leon, yet on a SUSE 12.1 system > > "./Build test --test-files t/SeqIO/SeqIO.t --verbose" > > produced the following log > > t/SeqIO/SeqIO.t .. > 1..45 > ok 1 - use Bio::SeqIO; > ok 2 > ok 3 - ID for format gcg > ok 4 > ok 5 > ok 6 > ok 7 > ok 8 - ID for format fasta > ok 9 > ok 10 > ok 11 > ok 12 - accession.version > ok 13 > ok 14 > ok 15 > ok 16 > ok 17 > ok 18 > ok 19 - ID for format pir > ok 20 > ok 21 > ok 22 > ok 23 > ok 24 > ok 25 - ID for format tab > ok 26 > ok 27 > ok 28 > ok 29 > ok 30 > ok 31 - ID for format ace > ok 32 > ok 33 > ok 34 > ok 35 > ok 36 - use Algorithm::Diff; > ok 37 - use IO::ScalarArray; > ok 38 - use IO::String; > ok 39 > ok 40 > ok 41 > not ok 42 - Must pass a file or file handle # TODO file/fh-based tests > should be in Bio::Root::IO, see issue #3204 > # Failed (TODO) test 'Must pass a file or file handle' > # at t/SeqIO/SeqIO.t line 120. > # expecting: Regexp ((?-xism:No file, fh, or string argument provided)) > # found: > # ------------- EXCEPTION ------------- > # MSG: Could not guess format from file/fh > # STACK Bio::SeqIO::new Bio/SeqIO.pm:389 > # STACK Test::Exception::throws_ok t/SeqIO/SeqIO.t:119 > # STACK toplevel t/SeqIO/SeqIO.t:120 > # ------------------------------------- > ok 43 - Must pass a file or file handle > ok 44 - Must pass a file or file handle > not ok 45 - Must pass a real file > > # Failed test 'Must pass a real file' > # at t/SeqIO/SeqIO.t line 135. > # expecting: Regexp ((?-xism:Can not open 'foo.bar' for reading: No such > file or directory)) > # found: > # ------------- EXCEPTION ------------- > # MSG: Can not open 'foo.bar' for reading: Datei oder Verzeichnis nicht > gefunden > # STACK Bio::Tools::GuessSeqFormat::guess Bio/Tools/GuessSeqFormat.pm:462 > # STACK Bio::SeqIO::new Bio/SeqIO.pm:381 > # STACK Test::Exception::throws_ok t/SeqIO/SeqIO.t:134 > # STACK toplevel t/SeqIO/SeqIO.t:135 > # ------------------------------------- > # Looks like you failed 1 test of 45. > Dubious, test returned 1 (wstat 256, 0x100) > Failed 1/45 subtests > > Test Summary Report > ------------------- > t/SeqIO/SeqIO.t (Wstat: 256 Tests: 45 Failed: 1) > Failed test: 45 > Non-zero exit status: 1 > Files=1, Tests=45, 2 wallclock secs ( 0.05 usr 0.04 sys + 0.46 cusr 0.12 > csys = 0.67 CPU) > Result: FAIL > Failed 1/1 test programs. 1/45 subtests failed. > > > Thanks > Thomas > > > > > Fields, Christopher J wrote: >> >> This unfortunately doesn't give us much to work on. Can you give the >> verbose test output? Something like: >> >> prove -lrv t/SeqIO/SeqIO.t >> >> or >> >> ./Build test --test-files t/SeqIO/SeqIO.t --verbose >> >> chris >> >> > From rbuels at gmail.com Fri Mar 16 15:48:34 2012 From: rbuels at gmail.com (Robert Buels) Date: Fri, 16 Mar 2012 12:48:34 -0700 Subject: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! Message-ID: <4F639912.3050909@gmail.com> Hi all, Great news: Google announced today that the Open Bioinformatics Foundation has been accepted as a mentoring organization for this summer's Google Summer of Code! GSoC is a Google-sponsored student internship program for open-source projects, open to students from around the world (not just US residents). Students are paid a $5000 USD stipend to work as a developer on an open-source project for the summer. For more on GSoC, see GSoC 2012 FAQ at http://goo.gl/kNv48 Student applications are due April 6, 2012 at 19:00 UTC. Students who are interested in participating should look at the OBF's GSoC page at http://open-bio.org/wiki/Google_Summer_of_Code, which lists project ideas, and whom to contact about applying. For current developers on OBF projects, please consider volunteering to be a mentor if you have not already, and contribute project ideas. Just list your name and project ideas on OBF wiki and on the relevant project's GSoC wiki page. Thanks to all who helped make OBF's application to GSoC a success, and let's have a great, productive summer of code! Rob Buels OBF GSoC 2012 Administrator From David.Messina at sbc.su.se Fri Mar 16 19:50:56 2012 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 16 Mar 2012 17:50:56 -0600 Subject: [Bioperl-l] Google Summer of Code is *ON* for OBF projects! In-Reply-To: <4F639912.3050909@gmail.com> References: <4F639912.3050909@gmail.com> Message-ID: Great job Rob and all involved! From philipp.schiffer at googlemail.com Sun Mar 18 12:49:11 2012 From: philipp.schiffer at googlemail.com (Philipp Schiffer) Date: Sun, 18 Mar 2012 17:49:11 +0100 Subject: [Bioperl-l] match patterns to retrieve seqs Message-ID: <4F661207.4080605@googlemail.com> Hi! I am trying to write a small script to retrieve such sequences (including headers) from a fasta file that match the headers I listed in another file. The fasta file looks like: >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191 MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR............. >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108 MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............ >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5 MLHHR....... while the list file is: maker-scaffold539_size40162-snap-gene-0.4-mRNA-1 snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1 maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1 maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1 My perl so far would be: #! /usr/bin/perl -w use Bio::SeqIO; my $usage = "list_from_fasta.pl fastafile format headerlistfile\n"; my $file = shift or die $usage; my $format = shift or die $usage; my $headerlist = shift or die $usage; open LIST, "<$headerlist" or die $!; my $queries = ; #might be a hash as well, don't know what's better my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format); while (my $fastaseq = $fastafile->next_seq){ if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much trying it seems actually to look for matches, but does not find any, still there must be {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";} ; }; So I am wondering, is this the right way to tackle the problem. Will it work? Should I use another bioPerl module? My thinking is, that the pattern matching operation after the 'if' statement is wrong. Any help highly appreciated! Thanks a lot. Philipp From philipp.schiffer at googlemail.com Sun Mar 18 16:36:18 2012 From: philipp.schiffer at googlemail.com (Philipp Schiffer) Date: Sun, 18 Mar 2012 21:36:18 +0100 Subject: [Bioperl-l] match patterns to retrieve seqs In-Reply-To: <4F661207.4080605@googlemail.com> References: <4F661207.4080605@googlemail.com> Message-ID: <4F664742.50406@googlemail.com> Sorry, at least my $queries should read my queries at . Am 18.03.12 17:49, schrieb Philipp Schiffer: > Hi! > > I am trying to write a small script to retrieve such sequences > (including headers) from a fasta file that match the headers I listed in > another file. > The fasta file looks like: > >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein > AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191 > MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR............. > >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein > AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108 > MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............ > >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein > AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5 > MLHHR....... > > while the list file is: > maker-scaffold539_size40162-snap-gene-0.4-mRNA-1 > snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1 > maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1 > maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1 > > My perl so far would be: > > #! /usr/bin/perl -w > > use Bio::SeqIO; > > my $usage = "list_from_fasta.pl fastafile format headerlistfile\n"; > my $file = shift or die $usage; > my $format = shift or die $usage; > my $headerlist = shift or die $usage; > > open LIST, "<$headerlist" or die $!; > my $queries = ; #might be a hash as well, don't know what's better > > my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format); > while (my $fastaseq = $fastafile->next_seq){ > > if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much > trying it seems actually to look for matches, but does not find any, > still there must be > > {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";} > ; > }; > > So I am wondering, is this the right way to tackle the problem. Will it > work? Should I use another bioPerl module? My thinking is, that the > pattern matching operation after the 'if' statement is wrong. > > Any help highly appreciated! > > Thanks a lot. > > Philipp > From fs5 at sanger.ac.uk Mon Mar 19 10:36:37 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 19 Mar 2012 14:36:37 +0000 Subject: [Bioperl-l] match patterns to retrieve seqs In-Reply-To: <4F661207.4080605@googlemail.com> References: <4F661207.4080605@googlemail.com> Message-ID: <4F674475.4010505@sanger.ac.uk> as you correctly guessed, your regex /(@queries)/ isn't doing what you were hoping it would do, i.e. match against all elements of the array. You could compile a regex with all the alternative patterns from your array but I think that's not very efficient, so you could try something like this instead: my $does_match = 0; $fastaseq ->id =~ /$_/ && $does_match = 1 for @queries; if ($does_match){ ... } Frank On 18/03/12 16:49, Philipp Schiffer wrote: > Hi! > > I am trying to write a small script to retrieve such sequences > (including headers) from a fasta file that match the headers I listed in > another file. > The fasta file looks like: > >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein > AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191 > MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR............. > >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein > AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108 > MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............ > >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein > AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5 > MLHHR....... > > while the list file is: > maker-scaffold539_size40162-snap-gene-0.4-mRNA-1 > snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1 > maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1 > maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1 > > My perl so far would be: > > #! /usr/bin/perl -w > > use Bio::SeqIO; > > my $usage = "list_from_fasta.pl fastafile format headerlistfile\n"; > my $file = shift or die $usage; > my $format = shift or die $usage; > my $headerlist = shift or die $usage; > > open LIST, "<$headerlist" or die $!; > my $queries = ; #might be a hash as well, don't know what's better > > my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format); > while (my $fastaseq = $fastafile->next_seq){ > > if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much > trying it seems actually to look for matches, but does not find any, > still there must be > > {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";} > ; > }; > > So I am wondering, is this the right way to tackle the problem. Will it > work? Should I use another bioPerl module? My thinking is, that the > pattern matching operation after the 'if' statement is wrong. > > Any help highly appreciated! > > Thanks a lot. > > Philipp > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From roy.chaudhuri at gmail.com Mon Mar 19 11:00:19 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Mon, 19 Mar 2012 15:00:19 +0000 Subject: [Bioperl-l] match patterns to retrieve seqs In-Reply-To: <4F664742.50406@googlemail.com> References: <4F661207.4080605@googlemail.com> <4F664742.50406@googlemail.com> Message-ID: <4F674A03.1010303@gmail.com> Hi Philipp, There are several problems with your code. Firstly, you will want to remove trailing newlines from your headers using chomp. Also, Perl patterns interpolate as if they were double quoted strings, which means that the elements of your array are all concatenated (separated by spaces) and you are trying to match them all in one go. Finally, it would be better to output using Bio::SeqIO rather than print. Since you are just looking for exact matches, it would be much more efficient to read your headers into a hash, and then use exists rather than pattern matching: my %query; while () { chomp; $query{$_}=1; } my $fastafile = Bio::SeqIO->new(-file =>$file, -format =>$format); my $out=Bio::SeqIO->new(-format=>$format); while (my $fastaseq = $fastafile->next_seq){ if (exists $query{$fastaseq->id}) { $out->write_seq($fastaseq); } } You should also look at the Bio::DB::Fasta module, which indexes your fasta file so is particularly useful for long files that you will access more than once: http://search.cpan.org/~cjfields/BioPerl-1.6.901/Bio/DB/Fasta.pm Hope this helps. Cheers, Roy. On 18/03/2012 20:36, Philipp Schiffer wrote: > Sorry, > > at least my $queries should read my queries at . > > > Am 18.03.12 17:49, schrieb Philipp Schiffer: >> Hi! >> >> I am trying to write a small script to retrieve such sequences >> (including headers) from a fasta file that match the headers I listed in >> another file. >> The fasta file looks like: >> >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein >> AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191 >> MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR............. >> >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein >> AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108 >> MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............ >> >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein >> AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5 >> MLHHR....... >> >> while the list file is: >> maker-scaffold539_size40162-snap-gene-0.4-mRNA-1 >> snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1 >> maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1 >> maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1 >> >> My perl so far would be: >> >> #! /usr/bin/perl -w >> >> use Bio::SeqIO; >> >> my $usage = "list_from_fasta.pl fastafile format headerlistfile\n"; >> my $file = shift or die $usage; >> my $format = shift or die $usage; >> my $headerlist = shift or die $usage; >> >> open LIST, "<$headerlist" or die $!; >> my $queries =; #might be a hash as well, don't know what's better >> >> my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format); >> while (my $fastaseq = $fastafile->next_seq){ >> >> if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much >> trying it seems actually to look for matches, but does not find any, >> still there must be >> >> {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";} >> ; >> }; >> >> So I am wondering, is this the right way to tackle the problem. Will it >> work? Should I use another bioPerl module? My thinking is, that the >> pattern matching operation after the 'if' statement is wrong. >> >> Any help highly appreciated! >> >> Thanks a lot. >> >> Philipp >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at gmail.com Mon Mar 19 12:40:11 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Mon, 19 Mar 2012 09:40:11 -0700 Subject: [Bioperl-l] match patterns to retrieve seqs In-Reply-To: <4F661207.4080605@googlemail.com> References: <4F661207.4080605@googlemail.com> Message-ID: <839171F1-A11A-4F2D-B76E-669F86A33BFB@gmail.com> And take a look at the script in the bioperl distribution index/bp_seqret.pl which will do exactly this. Jason On Mar 18, 2012, at 9:49 AM, Philipp Schiffer wrote: > Hi! > > I am trying to write a small script to retrieve such sequences (including headers) from a fasta file that match the headers I listed in another file. > The fasta file looks like: > >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191 > MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR............. > >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108 > MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............ > >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5 > MLHHR....... > > while the list file is: > maker-scaffold539_size40162-snap-gene-0.4-mRNA-1 > snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1 > maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1 > maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1 > > My perl so far would be: > > #! /usr/bin/perl -w > > use Bio::SeqIO; > > my $usage = "list_from_fasta.pl fastafile format headerlistfile\n"; > my $file = shift or die $usage; > my $format = shift or die $usage; > my $headerlist = shift or die $usage; > > open LIST, "<$headerlist" or die $!; > my $queries = ; #might be a hash as well, don't know what's better > > my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format); > while (my $fastaseq = $fastafile->next_seq){ > > if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much trying it seems actually to look for matches, but does not find any, still there must be > > {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";} > ; > }; > > So I am wondering, is this the right way to tackle the problem. Will it work? Should I use another bioPerl module? My thinking is, that the pattern matching operation after the 'if' statement is wrong. > > Any help highly appreciated! > > Thanks a lot. > > Philipp > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From philipp.schiffer at gmail.com Mon Mar 19 15:30:20 2012 From: philipp.schiffer at gmail.com (EvoPHS) Date: Mon, 19 Mar 2012 12:30:20 -0700 (PDT) Subject: [Bioperl-l] match patterns to retrieve seqs In-Reply-To: <4F674475.4010505@sanger.ac.uk> References: <4F661207.4080605@googlemail.com> <4F674475.4010505@sanger.ac.uk> Message-ID: <5245812.669.1332185420404.JavaMail.geo-discussion-forums@ynbu11> Thanks for help everybody On Monday, March 19, 2012 3:36:37 PM UTC+1, Frank Schwach wrote: > > as you correctly guessed, your regex > /(@queries)/ > isn't doing what you were hoping it would do, i.e. match against all > elements of the array. You could compile a regex with all the > alternative patterns from your array but I think that's not very > efficient, so you could try something like this instead: > > my $does_match = 0; > $fastaseq ->id =~ /$_/ && $does_match = 1 for @queries; > if ($does_match){ > ... > } > > Frank > > > On 18/03/12 16:49, Philipp Schiffer wrote: > > Hi! > > > > I am trying to write a small script to retrieve such sequences > > (including headers) from a fasta file that match the headers I listed in > > another file. > > The fasta file looks like: > > >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein > > AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191 > > MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR............. > > >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein > > AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108 > > MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............ > > >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein > > AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5 > > MLHHR....... > > > > while the list file is: > > maker-scaffold539_size40162-snap-gene-0.4-mRNA-1 > > snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1 > > maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1 > > maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1 > > > > My perl so far would be: > > > > #! /usr/bin/perl -w > > > > use Bio::SeqIO; > > > > my $usage = "list_from_fasta.pl fastafile format headerlistfile\n"; > > my $file = shift or die $usage; > > my $format = shift or die $usage; > > my $headerlist = shift or die $usage; > > > > open LIST, "<$headerlist" or die $!; > > my $queries = ; #might be a hash as well, don't know what's better > > > > my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format); > > while (my $fastaseq = $fastafile->next_seq){ > > > > if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much > > trying it seems actually to look for matches, but does not find any, > > still there must be > > > > {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";} > > ; > > }; > > > > So I am wondering, is this the right way to tackle the problem. Will it > > work? Should I use another bioPerl module? My thinking is, that the > > pattern matching operation after the 'if' statement is wrong. > > > > Any help highly appreciated! > > > > Thanks a lot. > > > > Philipp > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From philipp.schiffer at gmail.com Mon Mar 19 15:30:20 2012 From: philipp.schiffer at gmail.com (EvoPHS) Date: Mon, 19 Mar 2012 12:30:20 -0700 (PDT) Subject: [Bioperl-l] match patterns to retrieve seqs In-Reply-To: <4F674475.4010505@sanger.ac.uk> References: <4F661207.4080605@googlemail.com> <4F674475.4010505@sanger.ac.uk> Message-ID: <5245812.669.1332185420404.JavaMail.geo-discussion-forums@ynbu11> Thanks for help everybody On Monday, March 19, 2012 3:36:37 PM UTC+1, Frank Schwach wrote: > > as you correctly guessed, your regex > /(@queries)/ > isn't doing what you were hoping it would do, i.e. match against all > elements of the array. You could compile a regex with all the > alternative patterns from your array but I think that's not very > efficient, so you could try something like this instead: > > my $does_match = 0; > $fastaseq ->id =~ /$_/ && $does_match = 1 for @queries; > if ($does_match){ > ... > } > > Frank > > > On 18/03/12 16:49, Philipp Schiffer wrote: > > Hi! > > > > I am trying to write a small script to retrieve such sequences > > (including headers) from a fasta file that match the headers I listed in > > another file. > > The fasta file looks like: > > >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein > > AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191 > > MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR............. > > >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein > > AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108 > > MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............ > > >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein > > AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5 > > MLHHR....... > > > > while the list file is: > > maker-scaffold539_size40162-snap-gene-0.4-mRNA-1 > > snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1 > > maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1 > > maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1 > > > > My perl so far would be: > > > > #! /usr/bin/perl -w > > > > use Bio::SeqIO; > > > > my $usage = "list_from_fasta.pl fastafile format headerlistfile\n"; > > my $file = shift or die $usage; > > my $format = shift or die $usage; > > my $headerlist = shift or die $usage; > > > > open LIST, "<$headerlist" or die $!; > > my $queries = ; #might be a hash as well, don't know what's better > > > > my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format); > > while (my $fastaseq = $fastafile->next_seq){ > > > > if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much > > trying it seems actually to look for matches, but does not find any, > > still there must be > > > > {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";} > > ; > > }; > > > > So I am wondering, is this the right way to tackle the problem. Will it > > work? Should I use another bioPerl module? My thinking is, that the > > pattern matching operation after the 'if' statement is wrong. > > > > Any help highly appreciated! > > > > Thanks a lot. > > > > Philipp > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From peterson.mathenge at gmail.com Mon Mar 19 12:06:35 2012 From: peterson.mathenge at gmail.com (Peterson Mathenge) Date: Mon, 19 Mar 2012 09:06:35 -0700 Subject: [Bioperl-l] project interest Message-ID: Dear sir/Madam Hello my name is Mathenge Peterson i am a second year student from university of Nairobi Kenya undertaking an Msc Bioinformatics. i am interested in the oncoming project on bioperl and would like to be a part of the group. i have a good background in Perl a and other scripting language like Php. am looking forward to hearing from you. Best regard. Peterson From philipp.schiffer at googlemail.com Mon Mar 19 15:23:10 2012 From: philipp.schiffer at googlemail.com (Philipp Schiffer) Date: Mon, 19 Mar 2012 20:23:10 +0100 Subject: [Bioperl-l] match patterns to retrieve seqs In-Reply-To: <839171F1-A11A-4F2D-B76E-669F86A33BFB@gmail.com> References: <4F661207.4080605@googlemail.com> <839171F1-A11A-4F2D-B76E-669F86A33BFB@gmail.com> Message-ID: <4F67879E.10603@googlemail.com> Thanks Jason! I guess it's better to try around (and fail in my case) a bit before getting to know this. ;-) Am 19.03.12 17:40, schrieb Jason Stajich: > And take a look at the script in the bioperl distribution > > index/bp_seqret.pl > > which will do exactly this. > > Jason > On Mar 18, 2012, at 9:49 AM, Philipp Schiffer wrote: > >> Hi! >> >> I am trying to write a small script to retrieve such sequences >> (including headers) from a fasta file that match the headers I listed >> in another file. >> The fasta file looks like: >> >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein >> AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191 >> MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR............. >> >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein >> AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108 >> MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............ >> >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein >> AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5 >> MLHHR....... >> >> while the list file is: >> maker-scaffold539_size40162-snap-gene-0.4-mRNA-1 >> snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1 >> maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1 >> maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1 >> >> My perl so far would be: >> >> #! /usr/bin/perl -w >> >> use Bio::SeqIO; >> >> my $usage = "list_from_fasta.pl fastafile format headerlistfile\n"; >> my $file = shift or die $usage; >> my $format = shift or die $usage; >> my $headerlist = shift or die $usage; >> >> open LIST, "<$headerlist" or die $!; >> my $queries = ; #might be a hash as well, don't know what's better >> >> my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format); >> while (my $fastaseq = $fastafile->next_seq){ >> >> if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much >> trying it seems actually to look for matches, but does not find any, >> still there must be >> >> {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";} >> ; >> }; >> >> So I am wondering, is this the right way to tackle the problem. Will >> it work? Should I use another bioPerl module? My thinking is, that the >> pattern matching operation after the 'if' statement is wrong. >> >> Any help highly appreciated! >> >> Thanks a lot. >> >> Philipp >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > From florent.angly at gmail.com Mon Mar 19 21:11:04 2012 From: florent.angly at gmail.com (Florent Angly) Date: Tue, 20 Mar 2012 11:11:04 +1000 Subject: [Bioperl-l] match patterns to retrieve seqs In-Reply-To: <4F664742.50406@googlemail.com> References: <4F661207.4080605@googlemail.com> <4F664742.50406@googlemail.com> Message-ID: <4F67D928.2000106@gmail.com> Sure, this is a valid way to approach the problem. However, the line that gives you grief is most likely this one: > if ($fastaseq ->id =~ /(@queries)/) You cannot use an array like this in a regexp and hope that it will match any of its elements. Most likely, it is evaluated as a scalar, i.e. the number of elements in the array, just like if you were doing: my $val = @queries; My solution would be to build a regular expression containing all the queries: > my $re; > for my $query (@queries) { > $re .= "$query|"; # query 1, OR query 2, ... > } > $re =~ s/\|$//; # remove trailing | > $re = '^'.$re.'$'; # anchor the regexp for full-matches only (no > partial matches) > $re = qr/$re/; # compile regexp for added speed Then you can use this regular expression to match the sequence IDs: > if ($fastaseq->id =~ /$re/) { Regards, Florent On 19/03/12 06:36, Philipp Schiffer wrote: > Sorry, > > at least my $queries should read my queries at . > > > Am 18.03.12 17:49, schrieb Philipp Schiffer: >> Hi! >> >> I am trying to write a small script to retrieve such sequences >> (including headers) from a fasta file that match the headers I listed in >> another file. >> The fasta file looks like: >> >maker-scaffold15249_size4120-snap-gene-0.2-mRNA-1 protein >> AED:0.0448532948532949 eAED:0.0448532948532949 QI:42|-1|0|1|-1|0|1|1|191 >> MDILSLPLSFVCLMTIAWVVLAAALFLLWENGWSYFNSFYFTVVSFSTVGLGDMTPDYTR............. >> >> >snap_masked-scaffold15249_size4120-abinit-gene-0.1-mRNA-1 protein >> AED:1 eAED:1 QI:0|0|0|0|1|1|4|0|108 >> MHAMWMIRKFRLQWRWCMHGRPRDEQPEHHCVMGFLAVTHANRAAYNTDTLPTMLTERRK............ >> >snap_masked-scaffold15249_size4120-abinit-gene-0.0-mRNA-1 protein >> AED:1 eAED:1 QI:55|0|0|0|1|1|2|227|5 >> MLHHR....... >> >> while the list file is: >> maker-scaffold539_size40162-snap-gene-0.4-mRNA-1 >> snap_masked-scaffold2197_size27871-abinit-gene-0.3-mRNA-1 >> maker-scaffold1000_size34843-snap-gene-0.7-mRNA-1 >> maker-scaffold10087_size13457-snap-gene-0.3-mRNA-1 >> >> My perl so far would be: >> >> #! /usr/bin/perl -w >> >> use Bio::SeqIO; >> >> my $usage = "list_from_fasta.pl fastafile format headerlistfile\n"; >> my $file = shift or die $usage; >> my $format = shift or die $usage; >> my $headerlist = shift or die $usage; >> >> open LIST, "<$headerlist" or die $!; >> my $queries = ; #might be a hash as well, don't know what's better >> >> my $fastafile = Bio::SeqIO->new(-file => "<$file", -format => $format); >> while (my $fastaseq = $fastafile->next_seq){ >> >> if ($fastaseq ->id =~ /(@queries)/) # here is the problem, after much >> trying it seems actually to look for matches, but does not find any, >> still there must be >> >> {print $fastaseq-> id,"\n" and print $fastaseq->seq,"\n";} >> ; >> }; >> >> So I am wondering, is this the right way to tackle the problem. Will it >> work? Should I use another bioPerl module? My thinking is, that the >> pattern matching operation after the 'if' statement is wrong. >> >> Any help highly appreciated! >> >> Thanks a lot. >> >> Philipp >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Mar 21 14:21:27 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 21 Mar 2012 18:21:27 +0000 Subject: [Bioperl-l] Fwd: [SO-devel] NCBI GFF3 support References: Message-ID: <70284DE8-7FFB-40EA-87DF-D8CBED83907F@illinois.edu> For those interested... chris Begin forwarded message: From: "Murphy, Terence (NIH/NLM/NCBI) [C]" > Subject: [SO-devel] NCBI GFF3 support Date: March 21, 2012 1:15:29 PM CDT To: "SO developers (song-devel at lists.sourceforge.net)" > Reply-To: SO developers > Hi All, I?m pleased to announce that NCBI has updated their GFF3 export software to the latest specifications (1.20), and is in the process of updating files on the NCBI Genomes FTP site (ftp://ftp.ncbi.nlm.nih.gov/genomes/). Files are now available for the NCBI annotations of the latest assemblies for human, cow, dog, pig, chicken, and many others, and will be provided as part of future releases. See the README files in each species directory for further details. For example, the human GRCh37.p5 annotation in top level (chromosome) coordinates is available at: ftp://ftp.ncbi.nlm.nih.gov/genomes/H_sapiens/GFF/ref_GRCh37.p5_top_level.gff3.gz Files in the /Bacteria, /Viruses, and other subdirectories are being updated as part of rolling update cycles. Files with this header were produced with the new writer: ##gff-version 3 #!gff-spec-version 1.20 #!processor NCBI annotwriter We?ve folded in a few bug fixes since we started using the new writer in production, and are working to refresh all the files in the near future. So you may see a few anomalies in files produced by annotwriter earlier this year. Files produced in March or later should be almost fine, with the exception of a problem with the ?is_circular=? tag starting with a lowercase 'i' (thanks to Peter for catching that so quickly). annotwriter is available for download as part of the NCBI C++ Toolkit, but the public toolkit isn?t updated very often so the current version is missing many updates made in the last year. An updated version of the toolkit is tentatively scheduled to be released in the next few months, so I would wait for that before trying to use annotwriter yourself for ASN to GFF3 conversion. Please contact the NCBI Service Desk (info at ncbi.nlm.nih.gov) if you have any questions or suggestions, or you can contact me directly or through this listserv. Enjoy! -Terence ----- Terence Murphy, Ph.D. RefSeq Project NCBI/NLM/NIH/DHHS 45 Center Drive, Room 4AS.37D-82 Bethesda, MD 20892-6510 Phone: 00-1-301-402-0990 e-mail: murphyte at ncbi.nlm.nih.gov ------------------------------------------------------------------------------ This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure_______________________________________________ SOng-devel mailing list SOng-devel at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/song-devel From shalabh.sharma7 at gmail.com Wed Mar 21 15:42:40 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 21 Mar 2012 15:42:40 -0400 Subject: [Bioperl-l] Select random sequences from a fasta file Message-ID: Hi All, Is there a way to select random sequences from a multi fasta file. I am using some method (not that sophisticated). Is there any module in bioperl that can do that? I have a fasta file containing around 10 million reads, and i want to get few thousand sequences out of it (randomly selected). Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From jason.stajich at gmail.com Wed Mar 21 16:07:58 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 21 Mar 2012 13:07:58 -0700 Subject: [Bioperl-l] Select random sequences from a fasta file In-Reply-To: References: Message-ID: <3D3C5FE3-2CAB-4CE9-931C-8CA71B44277D@gmail.com> Hi - If they are short reads and just the same length (e.g. one line per sequence) you can do this in plain perl with seek and a RNG to read 2 lines from the file. The problem in trying to do this in bioperl is the indexing of the multifasta file ends up being really slow when you get past ~4-5M IDs in the hash structure that is used. Plus there isn't a nice way to do this random selection other than to generate the full list of IDs and do the shuffling and pop off a few thousand to do the lookup. I think this is pretty way overkill for the problem you are trying to solve. There is a nice utility to do this as part of the Celera Assembler - if you use the gatekeeper tool there is an option after you build a store to then get a dump of a random subselection of the data. Jason On Mar 21, 2012, at 12:42 PM, shalabh sharma wrote: > Hi All, > Is there a way to select random sequences from a multi fasta > file. I am using some method (not that sophisticated). > Is there any module in bioperl that can do that? > > I have a fasta file containing around 10 million reads, and i want to get > few thousand sequences out of it (randomly selected). > > Thanks > Shalabh > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From shalabh.sharma7 at gmail.com Wed Mar 21 16:38:02 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 21 Mar 2012 16:38:02 -0400 Subject: [Bioperl-l] Select random sequences from a fasta file In-Reply-To: <3D3C5FE3-2CAB-4CE9-931C-8CA71B44277D@gmail.com> References: <3D3C5FE3-2CAB-4CE9-931C-8CA71B44277D@gmail.com> Message-ID: Thanks a lot Jason, I will look in to this, will also try celera assembler approach. Thanks Shalabh On Wed, Mar 21, 2012 at 4:07 PM, Jason Stajich wrote: > Hi - > > If they are short reads and just the same length (e.g. one line per > sequence) you can do this in plain perl with seek and a RNG to read 2 lines > from the file. > The problem in trying to do this in bioperl is the indexing of the > multifasta file ends up being really slow when you get past ~4-5M IDs in > the hash structure that is used. Plus there isn't a nice way to do this > random selection other than to generate the full list of IDs and do the > shuffling and pop off a few thousand to do the lookup. I think this is > pretty way overkill for the problem you are trying to solve. > > There is a nice utility to do this as part of the Celera Assembler - if > you use the gatekeeper tool there is an option after you build a store to > then get a dump of a random subselection of the data. > > Jason > On Mar 21, 2012, at 12:42 PM, shalabh sharma wrote: > > Hi All, > Is there a way to select random sequences from a multi fasta > file. I am using some method (not that sophisticated). > Is there any module in bioperl that can do that? > > I have a fasta file containing around 10 million reads, and i want to get > few thousand sequences out of it (randomly selected). > > Thanks > Shalabh > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From jason.stajich at gmail.com Wed Mar 21 16:58:14 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Wed, 21 Mar 2012 13:58:14 -0700 Subject: [Bioperl-l] get_MLmatrix In-Reply-To: References: Message-ID: <54CB33B3-BA64-47BC-842C-D486A0302447@gmail.com> This is undoubtably due to changes in the output format between versions. Aaron and I aren't maintaining this but I am CC-ing the mailing list to see if someone else is working on it. The roads to solution are - fix the parser - run an earlier version of PAML that can be parsed with the current code Sorry there isn't a better solution but there is no adhered to standard for output by PAML so it requires a developer to constantly track the program and its output and reverse engineer it to keep the parser working. For the second error, I am not sure if you are calling the function on right object -- maybe there is a mistake in the documentation relative to the API now, I'm not sure. The object Bio::Tools::Phylo::PAML::ModelResult has a kappa method so can you check what object type you have when you are calling kappa e.g. if you are calling $result->kappa then add debugging print to your code warn( ref($result), "\n"); Jason On Mar 21, 2012, at 1:41 PM, Vaclav Janousek wrote: > Hi, > > I'd like to use the bioperl packages to parse the codeml mlc file (runmode > = -2). However, the parser retrieves empty field for _mlmatrix > (get_MLmatrix). I am using the newest version of PAML 4.5 so there might be > some change in the output file which causes the parser cannot recognise the > part containing the parameter estimates as well as the lnL value. Is there > any chance to fix this problem? > > Also, the method get_model_params doesn't work - compiler complaints the > method cannot be located via package Bio::Tools::Phylo::PAML::Result. So > there is no way (in case the get_MLmatrix would work) how to obtain kappa > estimate (at least according the documentation for the > Bio::Tools::Phylo:PAML::Codeml). > > Thanks a lot, > > Vaclav > > -- > Vaclav Janousek > E-mail: vaclav.janousek at natur.cuni.cz > > Department of Zoology > Charles University in Prague > Faculty of Science > Albertov 6, 128 43 Praha 2 > www.natur.cuni.cz/en Jason Stajich jason.stajich at gmail.com jason at bioperl.org From lmanchon at univ-montp2.fr Thu Mar 22 05:03:39 2012 From: lmanchon at univ-montp2.fr (Laurent MANCHON) Date: Thu, 22 Mar 2012 10:03:39 +0100 Subject: [Bioperl-l] Select random sequences from a fasta file In-Reply-To: References: Message-ID: <4F6AEAEB.5040009@univ-montp2.fr> Le 21/03/2012 20:42, shalabh sharma a ?crit : > Hi All, > Is there a way to select random sequences from a multi fasta > file. I am using some method (not that sophisticated). > Is there any module in bioperl that can do that? > > I have a fasta file containing around 10 million reads, and i want to get > few thousand sequences out of it (randomly selected). > > Thanks > Shalabh > --Hello, i have a piece of code to randomly pick up lines from a file, maybe you can adapt this code to your problem: #!/usr/bin/perl # pick random lines from a file use strict; use warnings; use List::Util qw(shuffle); my $GET_LINES = 10000; my @line_starts; open( my $fh, '<', 'big_text_file.txt' ) or die "Oh, fudge: $!\n"; do { push @line_starts, tell $fh } while ( <$fh> ); my $count = @line_starts; print "Got $count lines\n"; my @shuffled_starts = (shuffle @line_starts)[0..$GET_LINES-1]; for my $start ( @shuffled_starts ) { seek $fh, $start, 0 or die "Unable to seek to line - $!\n"; print scalar <$fh>; } Regards, Laurent -- From joseramonblas at gmail.com Thu Mar 22 16:47:03 2012 From: joseramonblas at gmail.com (=?ISO-8859-1?Q?Jos=E9_Ram=F3n_Blas_Pastor?=) Date: Thu, 22 Mar 2012 21:47:03 +0100 Subject: [Bioperl-l] approximate regular expression Message-ID: I want to find all the matches of a given pattern in a string but doing fuzzy (approximate) pattern matching. I want to allow small variations (1 or 2 positions substituted) in the match. I have tried String::approx module, but I do not know how to manage syntax ... use String::approx 'amatch'; my $pattern = "JEJE"; my $string = "EJKJUJHJDJEJEJEDEJOJOJJJAHJHJSHJEFEJUJEJUJKIJS"; while (?=/$pattern/) { ... [download] How could I achieve that this 'while' allow 1 substitution from $pattern? That is that "JEFE", "JUJE","JEDE",... would be true matches. Thanks a lot in advance. JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 From joseramonblas at gmail.com Thu Mar 22 16:47:03 2012 From: joseramonblas at gmail.com (=?ISO-8859-1?Q?Jos=E9_Ram=F3n_Blas_Pastor?=) Date: Thu, 22 Mar 2012 21:47:03 +0100 Subject: [Bioperl-l] approximate regular expression Message-ID: I want to find all the matches of a given pattern in a string but doing fuzzy (approximate) pattern matching. I want to allow small variations (1 or 2 positions substituted) in the match. I have tried String::approx module, but I do not know how to manage syntax ... use String::approx 'amatch'; my $pattern = "JEJE"; my $string = "EJKJUJHJDJEJEJEDEJOJOJJJAHJHJSHJEFEJUJEJUJKIJS"; while (?=/$pattern/) { ... [download] How could I achieve that this 'while' allow 1 substitution from $pattern? That is that "JEFE", "JUJE","JEDE",... would be true matches. Thanks a lot in advance. JR -- Jos? Ram?n Blas - PhD Dept. Biochemistry - Medicine School University of Castilla-La Mancha C Almansa, 14 02006 Albacete (Spain) Phone: +34 967599200 ext. 2958 From awitney at sgul.ac.uk Thu Mar 22 17:25:40 2012 From: awitney at sgul.ac.uk (Adam Witney) Date: Thu, 22 Mar 2012 21:25:40 +0000 Subject: [Bioperl-l] approximate regular expression In-Reply-To: References: Message-ID: Hi Jose, I've never tried it but also have a look at this module: http://search.cpan.org/~limaone/Bio-Grep-v0.10.6/lib/Bio/Grep.pm Adam On 22 Mar 2012, at 20:47, Jos? Ram?n Blas Pastor wrote: > I want to find all the matches of a given pattern in a string but doing > fuzzy (approximate) pattern matching. I want to allow small variations (1 > or 2 positions substituted) in the match. I have tried String::approx > module, but I do not know how to manage syntax > > ... > use String::approx 'amatch'; > my $pattern = "JEJE"; > my $string = "EJKJUJHJDJEJEJEDEJOJOJJJAHJHJSHJEFEJUJEJUJKIJS"; > while (?=/$pattern/) { > ... > [download] > > How could I achieve that this 'while' allow 1 substitution from $pattern? > That is that "JEFE", "JUJE","JEDE",... would be true matches. > > Thanks a lot in advance. JR > > -- > Jos? Ram?n Blas - PhD > Dept. Biochemistry - Medicine School > University of Castilla-La Mancha > C Almansa, 14 > 02006 Albacete (Spain) > > Phone: +34 967599200 ext. 2958 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Thu Mar 22 20:07:47 2012 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 23 Mar 2012 10:07:47 +1000 Subject: [Bioperl-l] approximate regular expression In-Reply-To: References: Message-ID: <4F6BBED3.90703@gmail.com> On 23/03/12 06:47, Jos? Ram?n Blas Pastor wrote: > How could I achieve that this 'while' allow 1 substitution from $pattern? Hi Jos?, You can specify a number (or percentage) of insertions, deletions or substitutions in String::Approx by using modifiers: > @matches = amatch("pattern", [ modifiers], @inputs) See http://search.cpan.org/~jhi/String-Approx-3.26/Approx.pm#MODIFIERS for how to specify the modifiers. Florent From florent.angly at gmail.com Thu Mar 22 20:07:47 2012 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 23 Mar 2012 10:07:47 +1000 Subject: [Bioperl-l] approximate regular expression In-Reply-To: References: Message-ID: <4F6BBED3.90703@gmail.com> On 23/03/12 06:47, Jos? Ram?n Blas Pastor wrote: > How could I achieve that this 'while' allow 1 substitution from $pattern? Hi Jos?, You can specify a number (or percentage) of insertions, deletions or substitutions in String::Approx by using modifiers: > @matches = amatch("pattern", [ modifiers], @inputs) See http://search.cpan.org/~jhi/String-Approx-3.26/Approx.pm#MODIFIERS for how to specify the modifiers. Florent From nizamibilal1064 at gmail.com Sat Mar 24 15:06:34 2012 From: nizamibilal1064 at gmail.com (bilal) Date: Sat, 24 Mar 2012 12:06:34 -0700 (PDT) Subject: [Bioperl-l] Query and development of pattern search tool in metabolomics Message-ID: <5ea15546-6917-4546-892e-27d847baa2ca@px4g2000pbc.googlegroups.com> hi i am bilal from india. I am doing m. tech in bioinformtics. I am very much interested in bio perl project. I have been learning Perl and bioperl. I like the idea of "Perl Run Wrappers for External Programs in a Flash" but as i am new to the google summer of code, i dont know mentor organisation expect what from student.? further i have a idea which we can work out for this Google summer of code. currently i am working in a project "development of database for patterns in metabolomics ". Here strategy is that we will create a library of metabolome and there concentration, associated disease and other properties. It will be connected to the front end using perl script. here we can develop a tool for extracting the meaning full patterns in metabolomics. regards bilal From rbuels at gmail.com Sun Mar 25 16:09:50 2012 From: rbuels at gmail.com (Robert Buels) Date: Sun, 25 Mar 2012 13:09:50 -0700 Subject: [Bioperl-l] Announcing OBF Summer of Code - please forward! Message-ID: <4F6F7B8E.1050903@gmail.com> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 6! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2012 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2011 Applications due 19:00 UTC, April 6, 2012. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from sequence search I/O in BioPython to lightweight sequence objects and lazy parsing in BioPerl, a next-generation BioRuby interface to Ensembl to developing cloud-optimized versions of BioJava modules. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also particularly welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 26 through Friday, April 6th, 2012. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2012 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://www.google-melange.com/document/show/gsoc_program/google/gsoc2012/faqs From merche at uni-bonn.de Tue Mar 27 10:29:28 2012 From: merche at uni-bonn.de (Merche Castillo) Date: Tue, 27 Mar 2012 16:29:28 +0200 Subject: [Bioperl-l] New to BioPerl - Gene prediction & fgenesh Message-ID: <4F71CEC8.1080002@uni-bonn.de> Hi everyone, I am working on gene prediction in nematodes and I'm interested in learning more about BioPerl and how I can use it to get some nice prediction out of my nematode contigs. I have already done some predictions with the Fgenesh web version, but it takes a long time since I have to submit each contig separately. Thus I'm interesting in writing a script that would use the fgenesh module. I have seen the documentation is quite old, so I wonder whether the BioPerl module runs on the latest version. Since I would like to do comparative genomics among different nematode species I was wondering whether it is possible to find fgenesh+ or fgenesh++. Since I am quite new here I would appreciate any advice on BioPerl and gene prediction Thanks in advance Best Regards -- ************************************;) Mercedes Castillo INRES, Dept. Molecular Phytomedicine University of Bonn Karlrobert-Kreiten-str 13 53115 Bonn +49(0)22873-60143 merche at uni-bonn.de ***************************************** From cjfields at illinois.edu Tue Mar 27 13:14:08 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 27 Mar 2012 17:14:08 +0000 Subject: [Bioperl-l] New to BioPerl - Gene prediction & fgenesh In-Reply-To: <4F71CEC8.1080002@uni-bonn.de> References: <4F71CEC8.1080002@uni-bonn.de> Message-ID: <591BC908-9A97-427E-B1E1-FDB085055C3F@illinois.edu> On Mar 27, 2012, at 9:29 AM, Merche Castillo wrote: > Hi everyone, > > I am working on gene prediction in nematodes and I'm interested in learning more about BioPerl and how I can use it to get some nice prediction out of my nematode contigs. I have already done some predictions with the Fgenesh web version, but it takes a long time since I have to submit each contig separately. Thus I'm interesting in writing a script that would use the fgenesh module. I have seen the documentation is quite old, so I wonder whether the BioPerl module runs on the latest version. I'm not sure to tell the truth, but it shouldn't be terribly hard to update unless the API has changed significantly. > Since I would like to do comparative genomics among different nematode species I was wondering whether it is possible to find fgenesh+ or fgenesh++. Since I am quite new here I would appreciate any advice on BioPerl and gene prediction Have you looked at the MAKER pipeline? I understand it uses quite a few of the gene predictors (FGENESH included). http://gmod.org/wiki/MAKER http://www.yandell-lab.org/software/maker.html It doesn't appear they use bioperl-run modules for these, though. chris > Thanks in advance > > Best Regards > > > -- > ************************************;) > Mercedes Castillo > INRES, Dept. Molecular Phytomedicine > University of Bonn > > Karlrobert-Kreiten-str 13 > 53115 Bonn > +49(0)22873-60143 > merche at uni-bonn.de > ***************************************** > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From klew019 at aucklanduni.ac.nz Wed Mar 28 00:23:36 2012 From: klew019 at aucklanduni.ac.nz (Detrix) Date: Tue, 27 Mar 2012 21:23:36 -0700 (PDT) Subject: [Bioperl-l] SearchIO Message-ID: <33544807.post@talk.nabble.com> Hi, Im new to perl/bioperl and I need to write a script for an assignment. The background is that we BLAST searched a sequence on NCBI and came up with the hits. What I have to do is write a script that only extracts the HSPs for Mus musculus and mouse, but extract it and match it to each chromosome and write it to a table outfile. So far I have: use strict; use lib "C:/Program Files (x86)/BioPerl"; use Bio::SearchIO; my $parser = new Bio::SearchIO(-format => 'blast', -file => 'nucleotide.pl'); while (my $result = $parser->next_result) { while (my $hit = $result->next_hit) { if ($hit->description =~ /(Mus musculus)|(Mouse)/i) { while (my $hsp = $hit->next_hsp) { print " Hit=", $hit->description, "\n"; print " HSPs=", $hit->num_hsps, "\n"; } } } } What this gets me is the list of all the descriptions of the hits (mouse and mus musculus), and the HSPs for them. What I need now is to sort all the HSPs for each particular chromosome, and write it to a table outfile. I think what I have to do is sort it into an associative array, but all attempts at it I have failed. Im lost, so any help would be greatly appreciated! Thanks -- View this message in context: http://old.nabble.com/SearchIO-tp33544807p33544807.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From guedes_1000 at hotmail.com Wed Mar 28 21:29:05 2012 From: guedes_1000 at hotmail.com (Aureliano Guedes) Date: Thu, 29 Mar 2012 01:29:05 +0000 Subject: [Bioperl-l] GenePop to Phase Message-ID: Hello monks, This is my first time that I write here. I was read about Bio::PopGen::IO::phase, but nowhere I couldn't find how to write a script to convert genepop data to phase data. If somebody can help me, thanks a lot in advance. Aureliano Guedes From barry.utah at gmail.com Thu Mar 29 15:29:48 2012 From: barry.utah at gmail.com (Barry Moore) Date: Thu, 29 Mar 2012 13:29:48 -0600 Subject: [Bioperl-l] GFF3 Specification Message-ID: Hi All, There has been active discussion on the song-devel mailing list over the past 12 months about various ambiguities and unresolved issues with the GFF3 specification. The SO group is initiating a process to resolve these issues so that GFF3 can continue to serve it's role of unifying genome annotations in a format that promotes collaboration between genome projects and comparison of datasets across a wide variety of genomes. With the rapid acceleration of genome sequencing, a simple, standard format for genome annotation and comparative genomics is more critical than ever. Several issues have been raised, and they range from simple requests for clarification to more fundamental questions about the structure of the specification. We can't address all of these issues in one update to the spec, so we've started the ball rolling with three steps: Incorporate all the minor changes into a GFF3 1.21 candidate spec (http://www.sequenceontology.org/resources/gff3_1.21.html). Organize remaining unresolved issues onto a wiki page and start working through those issues one by one (http://www.sequenceontology.org/wiki/index.php?title=GFF3_Developement). Develop a set of wiki pages to describe 'GFF3 Best Practices' and existing community usage (http://www.sequenceontology.org/wiki/index.php?title=GFF3_best_practices). We will work through the unresolved issues one at a time - soliciting feedback from the community, and clarify/update the GFF3 spec in a backwards compatible way with existing tools and datasets, adding documenting wiki pages as needed. We welcome and encourage feedback from the genomics community and gratefully acknowledge those who have been active in the discussion thus far. Please have a look at the pages described above and join the conversation. The best place for discussion of all things GFF3 is the SO mailing list (song-devel at lists.sourceforge.net). Please feel free to re-post this message to relevant mailing lists so that all interested parties can be involved. On behalf of the SO developers - Thanks. Barry Moore Research Scientist Dept. of Human Genetics University of Utah Salt Lake City, UT 84112 -------------------------------------------- (801) 585-3543 From fs5 at sanger.ac.uk Fri Mar 30 07:10:13 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Fri, 30 Mar 2012 12:10:13 +0100 Subject: [Bioperl-l] SearchIO In-Reply-To: <33544807.post@talk.nabble.com> References: <33544807.post@talk.nabble.com> Message-ID: <4F759495.8070306@sanger.ac.uk> You are on the right track. Yes, you will need to first store the hits' data in a data-structure and then you will need another loop after parsing the BLAST results that traverses that data structure in order of chromosomes to print your results. You can use a hash (associative array) where your key is the chromosome and the value is an array of HSP data for that chromosome, so you will need to investigate how to build and traverse a hash of arrays. Take a look at his for example: http://www.perl.com/doc/FMTEYEWTK/pdsc/pdsc-2.html To learn how to do this, I would first write a little separate script that builds some hash of arrays and then try to traverse it in sorted order, i.e. you need to look up how to access keys of a hash in sorted order. I hope this will help to get you going again. Good luck! Frank On 28/03/12 05:23, Detrix wrote: > Hi, > > Im new to perl/bioperl and I need to write a script for an assignment. The > background is that we BLAST searched a sequence on NCBI and came up with the > hits. What I have to do is write a script that only extracts the HSPs for > Mus musculus and mouse, but extract it and match it to each chromosome and > write it to a table outfile. > > So far I have: > > > use strict; > use lib "C:/Program Files (x86)/BioPerl"; > > > use Bio::SearchIO; > my $parser = new Bio::SearchIO(-format => 'blast', > -file => 'nucleotide.pl'); > > while (my $result = $parser->next_result) { > > while (my $hit = $result->next_hit) { > > if ($hit->description =~ /(Mus musculus)|(Mouse)/i) { > > while (my $hsp = $hit->next_hsp) { > > > print > " Hit=", $hit->description, "\n"; > print > " HSPs=", $hit->num_hsps, "\n"; > > } > } > } > } > > > > What this gets me is the list of all the descriptions of the hits (mouse and > mus musculus), and the HSPs for them. What I need now is to sort all the > HSPs for each particular chromosome, and write it to a table outfile. I > think what I have to do is sort it into an associative array, but all > attempts at it I have failed. Im lost, so any help would be greatly > appreciated! > > Thanks > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From dan.bolser at gmail.com Fri Mar 30 08:37:13 2012 From: dan.bolser at gmail.com (Dan Bolser) Date: Fri, 30 Mar 2012 13:37:13 +0100 Subject: [Bioperl-l] Filtering blast results. Was "Re: Using PEIMAP" Message-ID: Cheers Joseph, I'd recommend using BioPerl (CC'ed) or similar to process your blast results. All the best, Dan. On 29 March 2012 08:40, Joseph Muldoon wrote: > Hi Dan, > > Thank you?for your help. I built the index and am now able to use the PEIMAP > database in a BLASTP search.?Would you know how I could filter the alignment > by minimum sequence identity?or length coverage? If not, is there anyone you > know who might be able to help? > Thank you again. > > Joseph >>> >>> Joseph Muldoon >>> University of Virginia > > From zwyl001 at aucklanduni.ac.nz Sat Mar 31 23:35:51 2012 From: zwyl001 at aucklanduni.ac.nz (Zachariah Wylde) Date: Sun, 1 Apr 2012 15:35:51 +1200 Subject: [Bioperl-l] Output of a BLAST parse to text file Message-ID: Hi there, I am very new to Bioperl, so excuse me if come across as simple! I need to write a bioperl script to extract information from BLAST results. The script needs to count how many HSPs are on each mouse chromosome and be written to a tab-separated table. I have this so far, but do not understand how to sort the information. I would much, appreciate if you could help me?? Yours sincerely, Zac Wylde use strict; use warnings; use lib "C:/Program Files (x86)/BioPerl"; use Bio::SearchIO; my $infile = "Alignment_Ref_Seq.txt"; open INFILE, $infile or die "Cannot open $infile: $!"; my $outfile = "assignment2.txt"; open OUTFILE, ">$outfile" or die "Cannot open $outfile: $!"; my $parser = new Bio::SearchIO(-format => 'blast', -file => 'Alignment_Ref_Seq.txt'); while (my $result = $parser->next_result){ while (my $hit = $result->next_hit){ while (my $hsp = $hit->next_hsp){ if ($hit->description =~ /(mus musculus)|(mouse)/i){ if ($hit->description =~ /chromosome (\w+)/){ print "Hit = ", $hit->name, " \t", "chromosome = ", $1, " \t", "HSPs = ", $hit->num_hsps, "\n"; } } } } } close INFILE; close OUTFILE; #unknown #chromosome from