From tzhu at mail.bnu.edu.cn Sun Jan 1 07:10:08 2012 From: tzhu at mail.bnu.edu.cn (Tao Zhu) Date: Sun, 01 Jan 2012 20:10:08 +0800 Subject: [Bioperl-l] How to align two DNA sequences which are very long? Message-ID: <4F004D20.3030009@mail.bnu.edu.cn> Hello, everyone This problem may not relate to BioPerl directly, but relate to bioinformatics. ClustalW is a good method to do multiple sequences alignment. However, if sequence is long, it would become extremely slow. I want to align a pair of mouse-rat ortholog, ENSMUSG00000000131 and ENSRNOG00000016722, which is 98787 bp and 89851 bp, respectively. I took almost 1.5 hours to finish alignment using clustalw! Is there a faster way to align two long DNA sequences? I want global alignment. -- Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing 100875, China Email: tzhu at mail.bnu.edu.cn From mcoyne at channing.harvard.edu Sun Jan 1 14:31:58 2012 From: mcoyne at channing.harvard.edu (Michael Coyne) Date: Sun, 1 Jan 2012 14:31:58 -0500 Subject: [Bioperl-l] Bioperl-l Digest, Vol 105, Issue 1 In-Reply-To: References: Message-ID: You can try MUSCLE (see http://www.drive5.com/muscle). On Sun, Jan 1, 2012 at 12:00 PM, wrote: > Send Bioperl-l mailing list submissions to > bioperl-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/bioperl-l > or, via email, send a message with subject or body 'help' to > bioperl-l-request at lists.open-bio.org > > You can reach the person managing the list at > bioperl-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Bioperl-l digest..." > > > Today's Topics: > > 1. How to align two DNA sequences which are very long? (Tao Zhu) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 01 Jan 2012 20:10:08 +0800 > From: Tao Zhu > Subject: [Bioperl-l] How to align two DNA sequences which are very > long? > To: BioPerl > Message-ID: <4F004D20.3030009 at mail.bnu.edu.cn> > Content-Type: text/plain; charset=UTF-8 > > Hello, everyone > > This problem may not relate to BioPerl directly, but relate to > bioinformatics. > > ClustalW is a good method to do multiple sequences alignment. However, > if sequence is long, it would become extremely slow. > > I want to align a pair of mouse-rat ortholog, ENSMUSG00000000131 and > ENSRNOG00000016722, which is 98787 bp and 89851 bp, respectively. I took > almost 1.5 hours to finish alignment using clustalw! > > Is there a faster way to align two long DNA sequences? I want global > alignment. > > > -- > Tao Zhu, College of Life Sciences, Beijing Normal University, Beijing > 100875, China > Email: tzhu at mail.bnu.edu.cn > > > > ------------------------------ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > End of Bioperl-l Digest, Vol 105, Issue 1 > ***************************************** > From casaburi at ceinge.unina.it Tue Jan 3 04:32:40 2012 From: casaburi at ceinge.unina.it (Giorgio C) Date: Tue, 3 Jan 2012 01:32:40 -0800 (PST) Subject: [Bioperl-l] Computer requirements to do bioinformatic analysis Message-ID: <33070641.post@talk.nabble.com> Hy all, my department want to buy a new computer to do bioinformatic analysis for next generation sequencing. Can you suggest me something good with some paramount requirements ?? (CPU; RAM....etc.etc.). Thanks you very much -- View this message in context: http://old.nabble.com/Computer-requirements-to-do-bioinformatic-analysis-tp33070641p33070641.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From p.j.a.cock at googlemail.com Tue Jan 3 05:14:13 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 Jan 2012 10:14:13 +0000 Subject: [Bioperl-l] Computer requirements to do bioinformatic analysis In-Reply-To: <33070641.post@talk.nabble.com> References: <33070641.post@talk.nabble.com> Message-ID: On Tue, Jan 3, 2012 at 9:32 AM, Giorgio C wrote: > > Hy all, > > my department want to buy a new computer to do bioinformatic analysis for > next generation sequencing. Can you suggest me something good with some > paramount requirements ?? (CPU; RAM....etc.etc.). > > Thanks you very much That is rather a vague query - important questions include what kind of organism (e.g. bacteria) which affects the size of the problem, also are you planning genome assembly, transcriptome assembly, or just mapping reads to a know reference. Have a look at the forum posts at http://seqanswers.com - both browsing and some specific searches since this kind of question has come up there a lot. Generally RAM is more of a bottleneck, but it depends on the task at hand. Peter > -- > View this message in context: http://old.nabble.com/Computer-requirements-to-do-bioinformatic-analysis-tp33070641p33070641.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From prashant24.juit at gmail.com Tue Jan 3 23:40:35 2012 From: prashant24.juit at gmail.com (Prashant Mishra) Date: Wed, 4 Jan 2012 10:10:35 +0530 Subject: [Bioperl-l] contribution to bioperl Message-ID: hello i am a bioinformatics undergraduate student with a good knowledge of perl, but haven't used bioperl much. I have used it for small work like reading and parsing sequences etc. I wish to contribute towards development of bioperl. Can you please point out where and how should i start off ? Thank you. From jason.stajich at gmail.com Wed Jan 4 02:13:50 2012 From: jason.stajich at gmail.com (Jason Stajich) Date: Tue, 3 Jan 2012 23:13:50 -0800 Subject: [Bioperl-l] contribution to bioperl In-Reply-To: References: Message-ID: <82507665-3FE2-4615-8E92-80C881ACDB59@gmail.com> Hi Prashant - we welcome new contributors so thanks for saying hello and happy to have your help on the project. Here is some info you can take a look at to get you started about writing code for the project http://bioperl.org/wiki/Advanced_BioPerl http://bioperl.org/wiki/Becoming_a_developer Here are some projects with descriptions that were written up a while ago http://bioperl.org/wiki/Project_priority_list (thinking aloud here) I wonder if we shouldn't convert some of these projects into redmine feature requests? https://redmine.open-bio.org/projects/bioperl/issues?set_filter=1&tracker_id=2 On Jan 3, 2012, at 8:40 PM, Prashant Mishra wrote: > hello i am a bioinformatics undergraduate student with a good knowledge of > perl, but haven't used bioperl much. > I have used it for small work like reading and parsing sequences etc. > I wish to contribute towards development of bioperl. > Can you please point out where and how should i start off ? > Thank you. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Jason Stajich jason.stajich at gmail.com jason at bioperl.org From casaburi at ceinge.unina.it Thu Jan 5 05:11:02 2012 From: casaburi at ceinge.unina.it (Giorgio C) Date: Thu, 5 Jan 2012 02:11:02 -0800 (PST) Subject: [Bioperl-l] Common elements within two set of miRNAs Message-ID: <33084750.post@talk.nabble.com> Hi all, i have two set of miRNAs from a 454 run. Now my question is how i can clusterize common elements from the two set ? For "common" i mean sequences that differ at most by 2 nucleotides ! Is there any tool that can do this automatically ? Thank you -- View this message in context: http://old.nabble.com/Common-elements-within-two-set-of-miRNAs-tp33084750p33084750.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jw12 at sanger.ac.uk Thu Jan 5 08:44:26 2012 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Thu, 5 Jan 2012 13:44:26 +0000 Subject: [Bioperl-l] Registrations for DAS Workshop 2012 Message-ID: <3637D8C0-AF24-4D42-90E8-85346C5F706D@sanger.ac.uk> DAS is currently being used to share annotations on genomes, protein alignments, structural and interaction information. If you are interested in sharing biological information the DAS workshop below may be of interest to you. Learn of and contribute to current developments in DAS such as: DAS in the cloud, DAS for Genotype Data, DAS searching, DAS for collaborative annotation projects, DAS alternative formats. Registration is open for the 2012 DAS workshop (27-29 February) at the Genome Campus, Hinxton UK. If you are interested in attending, please find out more by going to http://www.ebi.ac.uk/training/onsite/120227_DAS.html and register via the web link at the bottom of the page. This workshop will cater for novice to expert DAS users as each day is optional. Please register early as places will be limited. Registration closes 10 February 2012 - 12:00. If you are interested in giving a 15 minute talk on the second day please email Jonathan Warren using jonathan.warren at sanger.ac.uk Many thanks The Sanger/EBI DAS team. Jonathan Warren Senior Developer and DAS coordinator blog: http://biodasman.wordpress.com/ jw12 at sanger.ac.uk Ext: 2314 Telephone: 01223 492314 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From dcmertens.perl at gmail.com Thu Jan 5 10:24:18 2012 From: dcmertens.perl at gmail.com (David Mertens) Date: Thu, 5 Jan 2012 09:24:18 -0600 Subject: [Bioperl-l] Perl Performance Project? In-Reply-To: <87fwfy83q1.fsf@renormalist.net> References: <4EBF062B.7090208@pobox.com> <20111113164349.GP27858@lake.fysh.org> <4A901ED9-F136-4000-B6D5-2609BBEBAAB8@pobox.com> <4EC4271D.4080706@pobox.com> <20111117081659.62830069@pc09.procura.nl> <4EC4BE45.1010907@pobox.com> <20111117135414.GJ23881@plum.flirble.org> <87r516pd8v.fsf@renormalist.net> <87fwfy83q1.fsf@renormalist.net> Message-ID: Steffen (and BioPerl and PDL folks) - This looks interesting, and I suspect that the PDL and BioPerl communities might be interested in this, so I'm CC'ing them. PDL and BioPerl folks - take a look! At one point I read about somebody who wrote some git plugins that don't allow commits to the main branch unless they improve the test coverage. I suspect that one might be interested in implementing a similar approach for one's own work, focusing also on benchmarking. Steffen, a couple of suggestions: 1. You should include a link to perlformance.net in the docs 2. you should include a link to the graphs from the main web page You give a skeleton for writing a plugin, but I'm not quite sure how to set up my own server to test my plugin. Would I install Benchmark::Perl::Formance, build and install my own plugin, and then run david> benchmark-perlformance --plugins=MyPlugin Is it that simple? Thanks! David On Mon, Jan 2, 2012 at 10:32 AM, Steffen Schwigon wrote: > Hi all! > > I now have my Perl benchmarking infrastructure ready and already some > coverage over several Perl versions. > > The infrastructure consists of a Tapper and Codespeed instance, an own > *not* regularly updated CPAN mirror (to keep dependencies stable), and a > dedicated benchmark machine. > > One server is running a Tapper raw result database and website > (http://perlformance.net/), the Codespeed graph rendering website > (http://speed.perlformance.net/) and the CPAN mirror > (http://perlformance.net/CPAN/). > > The second server (perl64.org [6 core AMD Opteron 4180]) is dedicated to > only run benchmarks, without any disruption from email, web, or other > services. I also took care of disabling all OS features that typically > lead to deviation, like ASLR and Core Performance Boost. And, yes, Perl > is built using Yet Another Great Perl Bootstrap Script(tm) > (http://search.cpan.org/~schwigon/App-Bootstrap-Perl)[1]. > > Read more about the overall vision in my YAPC::EU 2011 slidedeck: > > http://perlformance.net/res/yapc_eu_2011_perlformance-net.pdf > > I blogged this also with some more details here: > > http://blogs.perl.org/users/steffen_schwigon/2012/01/perlformance.html > > Principally the infrastructure is able to consume benchmarks from other > providers. Talk to me if you want to track numbers from your machines. > > A side effect is a nice collection of many Perl installations. I can > easily upgrade and rerun new benchmarks over them. So whoever is > interested in benchmarks, please write a code snippet, ideally a > Perl::Formance plugin, and talk to me. > > Theoretically I also backup the data, let's see how reliable... :-) > > Happy New Year! > > Kind regards, > Steffen > > Footnotes: > [1] mine is better than yours :-), it can distroprefs! ANDK++ > > PS: I regularly struggle with dependencies when bleadperl breaks CPAN > or on 5.8.x, so there is still some maintenance effort and the > ?automation? is more or less still a while(true) loop -- but who > cares... > -- > Steffen Schwigon > Dresden Perl Mongers > -- Sent via my carrier pigeon. From mrbaker_mark at yahoo.com Thu Jan 5 12:31:11 2012 From: mrbaker_mark at yahoo.com (MARK BAKER) Date: Thu, 5 Jan 2012 09:31:11 -0800 (PST) Subject: [Bioperl-l] [Perldl] Perl Performance Project? In-Reply-To: <1325784475.54283.YahooMailNeo@web125802.mail.ne1.yahoo.com> References: <4EBF062B.7090208@pobox.com> <20111113164349.GP27858@lake.fysh.org> <4A901ED9-F136-4000-B6D5-2609BBEBAAB8@pobox.com> <4EC4271D.4080706@pobox.com> <20111117081659.62830069@pc09.procura.nl> <4EC4BE45.1010907@pobox.com> <20111117135414.GJ23881@plum.flirble.org> <87r516pd8v.fsf@renormalist.net> <87fwfy83q1.fsf@renormalist.net> <1325784475.54283.YahooMailNeo@web125802.mail.ne1.yahoo.com> Message-ID: <1325784671.57532.YahooMailNeo@web125803.mail.ne1.yahoo.com> ________________________________ The operating system is important with with benchmarking as well my most important perl scripts run 2 twice as fast on OpenSUSE 11.3 then on windows server 2008 /XP? / windows 2003 at both 64 and 32 bit levels Which I think is a important point to mention here, may be we can get tests done? to find the fastest OS... Cheers Mark R Baker ________________________________ From: David Mertens To: Steffen Schwigon Cc: bioperl-l at lists.open-bio.org; perldl Sent: Thursday, January 5, 2012 7:24 AM Subject: Re: [Perldl] Perl Performance Project? Steffen (and BioPerl and PDL folks) - This looks interesting, and I suspect that the PDL and BioPerl communities might be interested in this, so I'm CC'ing them. PDL and BioPerl folks - take a look! At one point I read about somebody who wrote some git plugins that don't allow commits to the main branch unless they improve the test coverage. I suspect that one might be interested in implementing a similar approach for one's own work, focusing also on benchmarking. Steffen, a couple of suggestions: 1. You should include a link to perlformance.net in the docs 2. you should include a link to the graphs from the main web page You give a skeleton for writing a plugin, but I'm not quite sure how to set up my own server to test my plugin. Would I install Benchmark::Perl::Formance, build and install my own plugin, and then run david> benchmark-perlformance --plugins=MyPlugin Is it that simple? Thanks! David On Mon, Jan 2, 2012 at 10:32 AM, Steffen Schwigon wrote: Hi all! > >I now have my Perl benchmarking infrastructure ready and already some >coverage over several Perl versions. > >The infrastructure consists of a Tapper and Codespeed instance, an own >*not* regularly updated CPAN mirror (to keep dependencies stable), and a >dedicated benchmark machine. > >One server is running a Tapper raw result database and website >(http://perlformance.net/), the Codespeed graph rendering website >(http://speed.perlformance.net/) and the CPAN mirror >(http://perlformance.net/CPAN/). > >The second server (perl64.org [6 core AMD Opteron 4180]) is dedicated to >only run benchmarks, without any disruption from email, web, or other >services. I also took care of disabling all OS features that typically >lead to deviation, like ASLR and Core Performance Boost. And, yes, Perl >is built using Yet Another Great Perl Bootstrap Script(tm) >(http://search.cpan.org/%7Eschwigon/App-Bootstrap-Perl%29%5B1%5D. > >Read more about the overall vision in my YAPC::EU 2011 slidedeck: > >?http://perlformance.net/res/yapc_eu_2011_perlformance-net.pdf > >I blogged this also with some more details here: > >?http://blogs.perl.org/users/steffen_schwigon/2012/01/perlformance.html > >Principally the infrastructure is able to consume benchmarks from other >providers. Talk to me if you want to track numbers from your machines. > >A side effect is a nice collection of many Perl installations. I can >easily upgrade and rerun new benchmarks over them. So whoever is >interested in benchmarks, please write a code snippet, ideally a >Perl::Formance plugin, and talk to me. > >Theoretically I also backup the data, let's see how reliable... :-) > >Happy New Year! > >Kind regards, >Steffen > >Footnotes: >[1] ?mine is better than yours :-), it can distroprefs! ANDK++ > >PS: I regularly struggle with dependencies when bleadperl breaks CPAN >? ?or on 5.8.x, so there is still some maintenance effort and the >? ??automation? is more or less still a while(true) loop -- but who >? ?cares... >-- >Steffen Schwigon >Dresden Perl Mongers > -- Sent via my carrier pigeon. _______________________________________________ Perldl mailing list Perldl at jach.hawaii.edu http://mailman.jach.hawaii.edu/mailman/listinfo/perldl From jovel_juan at hotmail.com Thu Jan 5 12:55:16 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Thu, 5 Jan 2012 17:55:16 +0000 Subject: [Bioperl-l] Common elements within two set of miRNAs In-Reply-To: <33084750.post@talk.nabble.com> References: <33084750.post@talk.nabble.com> Message-ID: Not clear to me what you want to do. I guess there are two options: (1) you want to profile mature miRNAs abundance? or (2) you want to profile miRNA transcripts? Allowing up to two mismatches. In either case, you can align your libraries to the miRNAs hairpins using any aligner (Bowtie, Soap, etc). However, since 454 libraries are 'relatively' small, you can also do this by stand-alone blast, and after that you parse your results with the searchIO module from BioPerl (two allow a max of 2 mismatches). > Date: Thu, 5 Jan 2012 02:11:02 -0800 > From: casaburi at ceinge.unina.it > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Common elements within two set of miRNAs > > > Hi all, > > i have two set of miRNAs from a 454 run. Now my question is how i can > clusterize common elements from the two set ? For "common" i mean sequences > that differ at most by 2 nucleotides ! Is there any tool that can do this > automatically ? > > Thank you > -- > View this message in context: http://old.nabble.com/Common-elements-within-two-set-of-miRNAs-tp33084750p33084750.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jan 5 13:19:35 2012 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 5 Jan 2012 12:19:35 -0600 Subject: [Bioperl-l] Common elements within two set of miRNAs In-Reply-To: References: <33084750.post@talk.nabble.com> Message-ID: <4F05E9B7.5070207@illinois.edu> You can use mcl for this, see: http://micans.org/mcl/man/clmprotocols.html#blast If you only want a difference of 2 nucleotides, it's probably worth pre-parsing the data to filter the information you need (as Juan suggests, Bio::SearchIO can do this), getting retained hits it into a form that mcl can use. chris On 01/05/2012 11:55 AM, Juan Jovel wrote: > > Not clear to me what you want to do. I guess there are two options: (1) you want to profile mature miRNAs abundance? or (2) you want to profile miRNA transcripts? Allowing up to two mismatches. In either case, you can align your libraries to the miRNAs hairpins using any aligner (Bowtie, Soap, etc). > However, since 454 libraries are 'relatively' small, you can also do this by stand-alone blast, and after that you parse your results with the searchIO module from BioPerl (two allow a max of 2 mismatches). > > > >> Date: Thu, 5 Jan 2012 02:11:02 -0800 >> From: casaburi at ceinge.unina.it >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Common elements within two set of miRNAs >> >> >> Hi all, >> >> i have two set of miRNAs from a 454 run. Now my question is how i can >> clusterize common elements from the two set ? For "common" i mean sequences >> that differ at most by 2 nucleotides ! Is there any tool that can do this >> automatically ? >> >> Thank you >> -- >> View this message in context: http://old.nabble.com/Common-elements-within-two-set-of-miRNAs-tp33084750p33084750.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From labrinitz at gmail.com Thu Jan 5 13:38:23 2012 From: labrinitz at gmail.com (Labrini Tziamourani) Date: Thu, 5 Jan 2012 10:38:23 -0800 Subject: [Bioperl-l] bioperl installation on windows Message-ID: Hello, I tried to install Bioperl in my windows PC(vista) using CPAN I followed the instructions of the " http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows" At the 11 step I have a problem When I typed "perl Build.PL" I took a message that "Unknown shell command 'perl'. Type ? for help. ". cpan> perl Build.PL Unknown shell command 'perl'. Type ? for help. How can I solve this problem? Any help is very much appreciated. Many Thanks, John From jovel_juan at hotmail.com Thu Jan 5 13:56:32 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Thu, 5 Jan 2012 18:56:32 +0000 Subject: [Bioperl-l] bioperl installation on windows In-Reply-To: References: Message-ID: Just move to UBUNTU John and make your life easier...;) > Date: Thu, 5 Jan 2012 10:38:23 -0800 > From: labrinitz at gmail.com > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] bioperl installation on windows > > Hello, > > I tried to install Bioperl in my windows PC(vista) using CPAN > I followed the instructions of the " > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows" > At the 11 step I have a problem > > When I typed "perl Build.PL" I took a message that "Unknown shell command > 'perl'. Type ? for help. > ". > > cpan> perl Build.PL > Unknown shell command 'perl'. Type ? for help. > How can I solve this problem? > > Any help is very much appreciated. > > Many Thanks, > John > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Thu Jan 5 13:55:27 2012 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 5 Jan 2012 11:55:27 -0700 Subject: [Bioperl-l] bioperl installation on windows In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40808A0CC@EX02.asurite.ad.asu.edu> Sounds like Perl isn't in your Windows search path. If you're on Windows, which version of perl are you using? Strawberry? ActiveState? -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Labrini Tziamourani Sent: Thursday, January 05, 2012 11:38 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] bioperl installation on windows Hello, I tried to install Bioperl in my windows PC(vista) using CPAN I followed the instructions of the " http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows" At the 11 step I have a problem When I typed "perl Build.PL" I took a message that "Unknown shell command 'perl'. Type ? for help. ". cpan> perl Build.PL Unknown shell command 'perl'. Type ? for help. How can I solve this problem? Any help is very much appreciated. Many Thanks, John _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Thu Jan 5 14:26:19 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 05 Jan 2012 19:26:19 +0000 Subject: [Bioperl-l] bioperl installation on windows In-Reply-To: References: Message-ID: <4F05F95B.4040407@gmail.com> Hi John, You're supposed to type "perl Build.PL" at the command line rather than at the cpan prompt. Although that is only if you want to install manually - the page you linked to suggests that you follow the instructions for installing via CPAN: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_THE_EASY_WAY_USING_CPAN Roy. On 05/01/2012 18:38, Labrini Tziamourani wrote: > Hello, > > I tried to install Bioperl in my windows PC(vista) using CPAN > I followed the instructions of the " > http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows" > At the 11 step I have a problem > > When I typed "perl Build.PL" I took a message that "Unknown shell command > 'perl'. Type ? for help. > ". > > cpan> perl Build.PL > Unknown shell command 'perl'. Type ? for help. > How can I solve this problem? > > Any help is very much appreciated. > > Many Thanks, > John > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Lizhe.Xu at aphis.usda.gov Thu Jan 5 13:52:02 2012 From: Lizhe.Xu at aphis.usda.gov (Xu, Lizhe - APHIS) Date: Thu, 5 Jan 2012 18:52:02 +0000 Subject: [Bioperl-l] Bio::SeqIO::excel Message-ID: <866064B6DA2EBC4EA48F326E3CC15C072D8B72@001FSN2MPN1-021.001f.mgd2.msft.net> I tried to use Bio::SeqIO to read an Excel file but got some error message. The following info includes my running env and screenshots. Please help me to fix the problem. Thank you very much. * The version of Bioperl you're working with. 1.6.1 [cid:image001.png at 01CCCBA0.3DA8CBA0] * The platform or operating system you're using. Window XP * What you are trying to do. Read the excel file with two columns (name of primer, primer sequence) without head row #!C:\Perl\bin -w use Bio::Seq; use Bio::SeqIO; $file="ViralPrimers.xls"; $seqio_obj = Bio::SeqIO->new(-file => $file, -format => 'excel' ); ...... [cid:image002.jpg at 01CCCBA0.3DA8CBA0] When I added the following statement to the script, use Bio::SeqIO::excel; [cid:image003.jpg at 01CCCBA0.3DA8CBA0] Lizhe Xu, PhD Microbiologist PVSS, FADDL, APHIS, USDA Plum Island Animal Disease Center Greenport, NY 11944 Tel: 631 323 3023 fax: 631 323 3366 Lizhe.Xu at aphis.usda.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 21438 bytes Desc: image001.png URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image002.jpg Type: image/jpeg Size: 74665 bytes Desc: image002.jpg URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.jpg Type: image/jpeg Size: 30053 bytes Desc: image003.jpg URL: From Kevin.M.Brown at asu.edu Thu Jan 5 15:20:04 2012 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Thu, 5 Jan 2012 13:20:04 -0700 Subject: [Bioperl-l] Bio::SeqIO::excel Message-ID: <1A4207F8295607498283FE9E93B775B408126724@EX02.asurite.ad.asu.edu> As the error message notes, you are missing a needed module for the excel parser to work, namely Spreadsheet::ParseExcel -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Xu, Lizhe - APHIS Sent: Thursday, January 05, 2012 11:52 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] Bio::SeqIO::excel I tried to use Bio::SeqIO to read an Excel file but got some error message. The following info includes my running env and screenshots. Please help me to fix the problem. Thank you very much. * The version of Bioperl you're working with. 1.6.1 [cid:image001.png at 01CCCBA0.3DA8CBA0] * The platform or operating system you're using. Window XP * What you are trying to do. Read the excel file with two columns (name of primer, primer sequence) without head row #!C:\Perl\bin -w use Bio::Seq; use Bio::SeqIO; $file="ViralPrimers.xls"; $seqio_obj = Bio::SeqIO->new(-file => $file, -format => 'excel' ); ...... [cid:image002.jpg at 01CCCBA0.3DA8CBA0] When I added the following statement to the script, use Bio::SeqIO::excel; [cid:image003.jpg at 01CCCBA0.3DA8CBA0] Lizhe Xu, PhD Microbiologist PVSS, FADDL, APHIS, USDA Plum Island Animal Disease Center Greenport, NY 11944 Tel: 631 323 3023 fax: 631 323 3366 Lizhe.Xu at aphis.usda.gov From florent.angly at gmail.com Thu Jan 5 18:08:52 2012 From: florent.angly at gmail.com (Florent Angly) Date: Fri, 06 Jan 2012 09:08:52 +1000 Subject: [Bioperl-l] Common elements within two set of miRNAs In-Reply-To: <4F05E9B7.5070207@illinois.edu> References: <33084750.post@talk.nabble.com> <4F05E9B7.5070207@illinois.edu> Message-ID: <4F062D84.4070409@gmail.com> Clustering software such as CD-HIT or UCLUST comes to mind. Florent On 06/01/12 04:19, Chris Fields wrote: > You can use mcl for this, see: > > http://micans.org/mcl/man/clmprotocols.html#blast > > If you only want a difference of 2 nucleotides, it's probably worth > pre-parsing the data to filter the information you need (as Juan > suggests, Bio::SearchIO can do this), getting retained hits it into a > form that mcl can use. > > chris > > On 01/05/2012 11:55 AM, Juan Jovel wrote: >> >> Not clear to me what you want to do. I guess there are two options: >> (1) you want to profile mature miRNAs abundance? or (2) you want to >> profile miRNA transcripts? Allowing up to two mismatches. In either >> case, you can align your libraries to the miRNAs hairpins using any >> aligner (Bowtie, Soap, etc). >> However, since 454 libraries are 'relatively' small, you can also do >> this by stand-alone blast, and after that you parse your results with >> the searchIO module from BioPerl (two allow a max of 2 mismatches). >> >> >> >>> Date: Thu, 5 Jan 2012 02:11:02 -0800 >>> From: casaburi at ceinge.unina.it >>> To: Bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Common elements within two set of miRNAs >>> >>> >>> Hi all, >>> >>> i have two set of miRNAs from a 454 run. Now my question is how i can >>> clusterize common elements from the two set ? For "common" i mean >>> sequences >>> that differ at most by 2 nucleotides ! Is there any tool that can do >>> this >>> automatically ? >>> >>> Thank you >>> -- >>> View this message in context: >>> http://old.nabble.com/Common-elements-within-two-set-of-miRNAs-tp33084750p33084750.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy.chaudhuri at gmail.com Fri Jan 6 08:28:07 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Fri, 06 Jan 2012 13:28:07 +0000 Subject: [Bioperl-l] (no subject) In-Reply-To: References: Message-ID: <4F06F6E7.1070308@gmail.com> Hi Labrini. Please ensure you copy replies to the mailing list, that way others can contribute to the discussion. I'm afraid I don't use BioPerl on Windows, so I'm not sure about the specific error messages you are getting. Note that the latest version is BioPerl-1.6.901 not 1.6.1. Also, for the attempted manual build you appear to be typing perl Build.PL at the root of your C drive, it needs to be entered in the BioPerl directory. If you are new to Perl and not familiar with using the command line I'd suggest that you start with something simple, not a complex package like BioPerl. There is a chapter in the O'Reilly "Learning Perl" book that introduces installing and using simple Perl modules. Roy. On 06/01/2012 03:13, Labrini Tziamourani wrote: > Thank you for your answer > I did according to :INSTALLING BIOPERL THE EASY WAY USING CPAN > c:> cpan > > cpan> d /bioperl/ > ... > cpan> install CJFIELDS/BioPerl-1.6.1.tar.gz > ..... > ..... > BioPerl-1.6.1/t/RemoteDB/BioFetch.t > BioPerl-1.6.1/t/RemoteDB/CUTG.t > BioPerl-1.6.1/t/RemoteDB/EMBL.t > BioPerl-1.6.1/t/RemoteDB/EntrezGene.t > BioPerl-1.6.1/t/RemoteDB/EUtilities.t > BioPerl-1.6.1/t/RemoteDB/GenBank.t > BioPerl-1.6.1/t/RemoteDB/GenPept.t > BioPerl-1.6.1/t/RemoteDB/MeSH.t > BioPerl-1.6.1/t/RemoteDB/RefSeq.t > BioPerl-1.6.1/t/RemoteDB/SeqHound.t > BioPerl-1.6.1/t/RemoteDB/SeqRead_fail.t > BioPerl-1.6.1/t/RemoteDB/SeqVersion.t > BioPerl-1.6.1/t/RemoteDB/SwissProt.t > BioPerl-1.6.1/t/RemoteDB/Taxonomy.t > BioPerl-1.6.1/t/RemoteDB/HIV > BioPerl-1.6.1/t/RemoteDB/HIV/HIV.t > BioPerl-1.6.1/t/RemoteDB/HIV/HIVAnnotProcessor.t > BioPerl-1.6.1/t/RemoteDB/HIV/HIVQuery.t > BioPerl-1.6.1/t/RemoteDB/HIV/HIVQueryHelper.t > BioPerl-1.6.1/t/RemoteDB/Query > BioPerl-1.6.1/t/RemoteDB/Query/GenBank.t > BioPerl-1.6.1/t/Restriction > BioPerl-1.6.1/t/Restriction/Analysis-refac.t > BioPerl-1.6.1/t/Restriction/Analysis.t > BioPerl-1.6.1/t/Restriction/Gel.t > BioPerl-1.6.1/t/Restriction/IO.t > BioPerl-1.6.1/t/Root > BioPerl-1.6.1/t/Root/Exception.t > BioPerl-1.6.1/t/Root/RootI.t > BioPerl-1.6.1/t/Root/RootIO.t > BioPerl-1.6.1/t/Root/Storable.t > BioPerl-1.6.1/t/Root/Tempfile.t > BioPerl-1.6.1/t/Root/Utilities.t > BioPerl-1.6.1/t/SearchIO > BioPerl-1.6.1/t/SearchIO/blast.t > BioPerl-1.6.1/t/SearchIO/blast_pull.t > BioPerl-1.6.1/t/SearchIO/blasttable.t > BioPerl-1.6.1/t/SearchIO/blastxml.t > BioPerl-1.6.1/t/SearchIO/CigarString.t > BioPerl-1.6.1/t/SearchIO/cross_match.t > BioPerl-1.6.1/t/SearchIO/erpin.t > BioPerl-1.6.1/t/SearchIO/exonerate.t > BioPerl-1.6.1/t/SearchIO/fasta.t > BioPerl-1.6.1/t/SearchIO/gmap_f9.t > BioPerl-1.6.1/t/SearchIO/hmmer.t > BioPerl-1.6.1/t/SearchIO/hmmer_pull.t > BioPerl-1.6.1/t/SearchIO/infernal.t > BioPerl-1.6.1/t/SearchIO/megablast.t > BioPerl-1.6.1/t/SearchIO/psl.t > BioPerl-1.6.1/t/SearchIO/rnamotif.t > BioPerl-1.6.1/t/SearchIO/SearchIO.t > BioPerl-1.6.1/t/SearchIO/sim4.t > BioPerl-1.6.1/t/SearchIO/SimilarityPair.t > BioPerl-1.6.1/t/SearchIO/Tiling.t > BioPerl-1.6.1/t/SearchIO/waba.t > BioPerl-1.6.1/t/SearchIO/wise.t > BioPerl-1.6.1/t/SearchIO/Writer > BioPerl-1.6.1/t/SearchIO/Writer/GbrowseGFF.t > BioPerl-1.6.1/t/SearchIO/Writer/HitTableWriter.t > BioPerl-1.6.1/t/SearchIO/Writer/HSPTableWriter.t > BioPerl-1.6.1/t/SearchIO/Writer/HTMLWriter.t > BioPerl-1.6.1/t/Seq > BioPerl-1.6.1/t/Seq/DBLink.t > BioPerl-1.6.1/t/Seq/EncodedSeq.t > BioPerl-1.6.1/t/Seq/LargeLocatableSeq.t > BioPerl-1.6.1/t/Seq/LargePSeq.t > BioPerl-1.6.1/t/Seq/LocatableSeq.t > BioPerl-1.6.1/t/Seq/MetaSeq.t > BioPerl-1.6.1/t/Seq/PrimaryQual.t > BioPerl-1.6.1/t/Seq/PrimarySeq.t > BioPerl-1.6.1/t/Seq/PrimedSeq.t > BioPerl-1.6.1/t/Seq/Quality.t > BioPerl-1.6.1/t/Seq/Seq.t > BioPerl-1.6.1/t/Seq/WithQuality.t > BioPerl-1.6.1/t/SeqFeature > BioPerl-1.6.1/t/SeqFeature/FeatureIO.t > BioPerl-1.6.1/t/SeqFeature/Location.t > BioPerl-1.6.1/t/SeqFeature/LocationFactory.t > BioPerl-1.6.1/t/SeqFeature/Primer.t > BioPerl-1.6.1/t/SeqFeature/Range.t > BioPerl-1.6.1/t/SeqFeature/RangeI.t > BioPerl-1.6.1/t/SeqFeature/SeqAnalysisParser.t > BioPerl-1.6.1/t/SeqFeature/SeqFeatAnnotated.t > BioPerl-1.6.1/t/SeqFeature/SeqFeatCollection.t > BioPerl-1.6.1/t/SeqFeature/SeqFeature.t > BioPerl-1.6.1/t/SeqFeature/SeqFeaturePrimer.t > BioPerl-1.6.1/t/SeqFeature/Unflattener.t > BioPerl-1.6.1/t/SeqFeature/Unflattener2.t > BioPerl-1.6.1/t/SeqIO > BioPerl-1.6.1/t/SeqIO/abi.t > BioPerl-1.6.1/t/SeqIO/ace.t > BioPerl-1.6.1/t/SeqIO/agave.t > BioPerl-1.6.1/t/SeqIO/alf.t > BioPerl-1.6.1/t/SeqIO/asciitree.t > BioPerl-1.6.1/t/SeqIO/bsml.t > BioPerl-1.6.1/t/SeqIO/bsml_sax.t > BioPerl-1.6.1/t/SeqIO/chadoxml.t > BioPerl-1.6.1/t/SeqIO/chaos.t > BioPerl-1.6.1/t/SeqIO/chaosxml.t > BioPerl-1.6.1/t/SeqIO/ctf.t > BioPerl-1.6.1/t/SeqIO/embl.t > BioPerl-1.6.1/t/SeqIO/entrezgene.t > BioPerl-1.6.1/t/SeqIO/excel.t > BioPerl-1.6.1/t/SeqIO/exp.t > BioPerl-1.6.1/t/SeqIO/fasta.t > BioPerl-1.6.1/t/SeqIO/fastq.t > BioPerl-1.6.1/t/SeqIO/flybase_chadoxml.t > BioPerl-1.6.1/t/SeqIO/game.t > BioPerl-1.6.1/t/SeqIO/gcg.t > BioPerl-1.6.1/t/SeqIO/genbank.t > BioPerl-1.6.1/t/SeqIO/Handler.t > BioPerl-1.6.1/t/SeqIO/interpro.t > BioPerl-1.6.1/t/SeqIO/kegg.t > BioPerl-1.6.1/t/SeqIO/largefasta.t > BioPerl-1.6.1/t/SeqIO/lasergene.t > BioPerl-1.6.1/t/SeqIO/locuslink.t > BioPerl-1.6.1/t/SeqIO/metafasta.t > BioPerl-1.6.1/t/SeqIO/MultiFile.t > BioPerl-1.6.1/t/SeqIO/Multiple_fasta.t > BioPerl-1.6.1/t/SeqIO/phd.t > BioPerl-1.6.1/t/SeqIO/pir.t > BioPerl-1.6.1/t/SeqIO/pln.t > BioPerl-1.6.1/t/SeqIO/qual.t > BioPerl-1.6.1/t/SeqIO/raw.t > BioPerl-1.6.1/t/SeqIO/scf.t > BioPerl-1.6.1/t/SeqIO/SeqBuilder.t > BioPerl-1.6.1/t/SeqIO/Splicedseq.t > BioPerl-1.6.1/t/SeqIO/strider.t > BioPerl-1.6.1/t/SeqIO/swiss.t > BioPerl-1.6.1/t/SeqIO/tab.t > BioPerl-1.6.1/t/SeqIO/table.t > BioPerl-1.6.1/t/SeqIO/tigr.t > BioPerl-1.6.1/t/SeqIO/tigrxml.t > BioPerl-1.6.1/t/SeqIO/tinyseq.t > BioPerl-1.6.1/t/SeqIO/ztr.t > BioPerl-1.6.1/t/SeqTools > BioPerl-1.6.1/t/SeqTools/Backtranslate.t > BioPerl-1.6.1/t/SeqTools/CodonTable.t > BioPerl-1.6.1/t/SeqTools/ECnumber.t > BioPerl-1.6.1/t/SeqTools/GuessSeqFormat.t > BioPerl-1.6.1/t/SeqTools/OddCodes.t > BioPerl-1.6.1/t/SeqTools/SeqPattern.t > BioPerl-1.6.1/t/SeqTools/SeqStats.t > BioPerl-1.6.1/t/SeqTools/SeqUtils.t > BioPerl-1.6.1/t/SeqTools/SeqWords.t > BioPerl-1.6.1/t/Structure > BioPerl-1.6.1/t/Structure/IO.t > BioPerl-1.6.1/t/Structure/Structure.t > BioPerl-1.6.1/t/Tools > BioPerl-1.6.1/t/Tools/ePCR.t > BioPerl-1.6.1/t/Tools/Est2Genome.t > BioPerl-1.6.1/t/Tools/FootPrinter.t > BioPerl-1.6.1/t/Tools/Geneid.t > BioPerl-1.6.1/t/Tools/Genewise.t > BioPerl-1.6.1/t/Tools/Genomewise.t > BioPerl-1.6.1/t/Tools/Genpred.t > BioPerl-1.6.1/t/Tools/GFF.t > BioPerl-1.6.1/t/Tools/Hmmer.t > BioPerl-1.6.1/t/Tools/IUPAC.t > BioPerl-1.6.1/t/Tools/Lucy.t > BioPerl-1.6.1/t/Tools/Match.t > BioPerl-1.6.1/t/Tools/pICalculator.t > BioPerl-1.6.1/t/Tools/Primer3.t > BioPerl-1.6.1/t/Tools/Promoterwise.t > BioPerl-1.6.1/t/Tools/Pseudowise.t > BioPerl-1.6.1/t/Tools/QRNA.t > BioPerl-1.6.1/t/Tools/RandDistFunctions.t > BioPerl-1.6.1/t/Tools/RepeatMasker.t > BioPerl-1.6.1/t/Tools/rnamotif.t > BioPerl-1.6.1/t/Tools/Seg.t > BioPerl-1.6.1/t/Tools/Sigcleave.t > BioPerl-1.6.1/t/Tools/Signalp.t > BioPerl-1.6.1/t/Tools/Sim4.t > BioPerl-1.6.1/t/Tools/SiRNA.t > BioPerl-1.6.1/t/Tools/TandemRepeatsFinder.t > BioPerl-1.6.1/t/Tools/TargetP.t > BioPerl-1.6.1/t/Tools/Tmhmm.t > BioPerl-1.6.1/t/Tools/tRNAscanSE.t > BioPerl-1.6.1/t/Tools/Alignment > BioPerl-1.6.1/t/Tools/Alignment/Consed.t > BioPerl-1.6.1/t/Tools/Analysis > BioPerl-1.6.1/t/Tools/Analysis/DNA > BioPerl-1.6.1/t/Tools/Analysis/DNA/ESEfinder.t > BioPerl-1.6.1/t/Tools/Analysis/Protein > BioPerl-1.6.1/t/Tools/Analysis/Protein/Domcut.t > BioPerl-1.6.1/t/Tools/Analysis/Protein/ELM.t > BioPerl-1.6.1/t/Tools/Analysis/Protein/GOR4.t > BioPerl-1.6.1/t/Tools/Analysis/Protein/HNN.t > BioPerl-1.6.1/t/Tools/Analysis/Protein/Mitoprot.t > BioPerl-1.6.1/t/Tools/Analysis/Protein/NetPhos.t > BioPerl-1.6.1/t/Tools/Analysis/Protein/Scansite.t > BioPerl-1.6.1/t/Tools/Analysis/Protein/Sopma.t > BioPerl-1.6.1/t/Tools/EMBOSS > BioPerl-1.6.1/t/Tools/EMBOSS/Palindrome.t > BioPerl-1.6.1/t/Tools/EUtilities > BioPerl-1.6.1/t/Tools/EUtilities/egquery.t > BioPerl-1.6.1/t/Tools/EUtilities/einfo.t > BioPerl-1.6.1/t/Tools/EUtilities/elink_acheck.t > BioPerl-1.6.1/t/Tools/EUtilities/elink_lcheck.t > BioPerl-1.6.1/t/Tools/EUtilities/elink_llinks.t > BioPerl-1.6.1/t/Tools/EUtilities/elink_ncheck.t > BioPerl-1.6.1/t/Tools/EUtilities/elink_neighbor.t > BioPerl-1.6.1/t/Tools/EUtilities/elink_neighbor_history.t > BioPerl-1.6.1/t/Tools/EUtilities/elink_scores.t > BioPerl-1.6.1/t/Tools/EUtilities/epost.t > BioPerl-1.6.1/t/Tools/EUtilities/esearch.t > BioPerl-1.6.1/t/Tools/EUtilities/espell.t > BioPerl-1.6.1/t/Tools/EUtilities/esummary.t > BioPerl-1.6.1/t/Tools/EUtilities/EUtilParameters.t > BioPerl-1.6.1/t/Tools/Phylo > BioPerl-1.6.1/t/Tools/Phylo/Gerp.t > BioPerl-1.6.1/t/Tools/Phylo/Molphy.t > BioPerl-1.6.1/t/Tools/Phylo/PAML.t > BioPerl-1.6.1/t/Tools/Phylo/Phylip > BioPerl-1.6.1/t/Tools/Phylo/Phylip/ProtDist.t > BioPerl-1.6.1/t/Tools/Run > BioPerl-1.6.1/t/Tools/Run/RemoteBlast.t > BioPerl-1.6.1/t/Tools/Run/RemoteBlast_rpsblast.t > BioPerl-1.6.1/t/Tools/Run/StandAloneBlast.t > BioPerl-1.6.1/t/Tools/Run/WrapperBase.t > BioPerl-1.6.1/t/Tools/Signalp > BioPerl-1.6.1/t/Tools/Signalp/ExtendedSignalp.t > BioPerl-1.6.1/t/Tools/Spidey > BioPerl-1.6.1/t/Tools/Spidey/Spidey.t > BioPerl-1.6.1/t/Tree > BioPerl-1.6.1/t/Tree/Compatible.t > BioPerl-1.6.1/t/Tree/Node.t > BioPerl-1.6.1/t/Tree/RandomTreeFactory.t > BioPerl-1.6.1/t/Tree/Tree.t > BioPerl-1.6.1/t/Tree/TreeIO.t > BioPerl-1.6.1/t/Tree/TreeStatistics.t > BioPerl-1.6.1/t/Tree/PhyloNetwork > BioPerl-1.6.1/t/Tree/PhyloNetwork/Factory.t > BioPerl-1.6.1/t/Tree/PhyloNetwork/GraphViz.t > BioPerl-1.6.1/t/Tree/PhyloNetwork/MuVector.t > BioPerl-1.6.1/t/Tree/PhyloNetwork/PhyloNetwork.t > BioPerl-1.6.1/t/Tree/PhyloNetwork/RandomFactory.t > BioPerl-1.6.1/t/Tree/PhyloNetwork/TreeFactory.t > BioPerl-1.6.1/t/Tree/TreeIO > BioPerl-1.6.1/t/Tree/TreeIO/lintree.t > BioPerl-1.6.1/t/Tree/TreeIO/newick.t > BioPerl-1.6.1/t/Tree/TreeIO/nexus.t > BioPerl-1.6.1/t/Tree/TreeIO/nhx.t > BioPerl-1.6.1/t/Tree/TreeIO/phyloxml.t > BioPerl-1.6.1/t/Tree/TreeIO/svggraph.t > BioPerl-1.6.1/t/Tree/TreeIO/tabtree.t > BioPerl-1.6.1/t/Variation > BioPerl-1.6.1/t/Variation/AAChange.t > BioPerl-1.6.1/t/Variation/AAReverseMutate.t > BioPerl-1.6.1/t/Variation/Allele.t > BioPerl-1.6.1/t/Variation/DNAMutation.t > BioPerl-1.6.1/t/Variation/RNAChange.t > BioPerl-1.6.1/t/Variation/SeqDiff.t > BioPerl-1.6.1/t/Variation/SNP.t > BioPerl-1.6.1/t/Variation/Variation_IO.t > Catching error: "Couldn't move > C:\\Perl\\cpan\\build\\tmp-3156\\BioPerl-1.6.1\\t > to C:\\Perl\\cpan\\build\\BioPerl-1.6.1-othMfb\\t: No such file or > directory at > C:\\Perl\\site\\lib/CPAN/Distribution.pm line > 524\cJ\cICPAN::Distribution::run_ > preps_on_packagedir('CPAN::Distribution=HASH(0xa201d94)') called at > C:\\Perl\\si > te\\lib/CPAN/Distribution.pm line > 351\cJ\cICPAN::Distribution::get('CPAN::Distri > bution=HASH(0xa201d94)') called at > C:\\Perl\\site\\lib/CPAN/Distribution.pm line > 1754\cJ\cICPAN::Distribution::make('CPAN::Distribution=HASH(0xa201d94)') called > at C:\\Perl\\site\\lib/CPAN/Distribution.pm line > 3067\cJ\cICPAN::Distribution:: > test('CPAN::Distribution=HASH(0xa201d94)') called at > C:\\Perl\\site\\lib/CPAN/Di > stribution.pm line > 3469\cJ\cICPAN::Distribution::install('CPAN::Distribution=HAS > H(0xa201d94)') called at C:\\Perl\\site\\lib/CPAN/Shell.pm line > 1797\cJ\cICPAN:: > Shell::rematein('CPAN::Shell', 'install', > 'CJFIELDS/BioPerl-1.6.1.tar.gz') calle > d at C:\\Perl\\site\\lib/CPAN/Shell.pm line > 1977\cJ\cICPAN::Shell::__ANON__('CPA > N::Shell', 'CJFIELDS/BioPerl-1.6.1.tar.gz') called at > C:/Perl/site/lib/CPAN.pm l > ine 376\cJ\cIeval {...} called at C:/Perl/site/lib/CPAN.pm line > 373\cJ\cICPAN::s > hell() called at C:/Perl/site/lib/App/Cpan.pm line > 295\cJ\cIApp::Cpan::_process_ > options('App::Cpan') called at C:/Perl/site/lib/App/Cpan.pm line > 364\cJ\cIApp::C > pan::run('App::Cpan') called at C:\\Perl\\site\\bin/cpan line 8\cJ" at > C:/Perl/s > ite/lib/CPAN.pm line 392 > CPAN::shell() called at C:/Perl/site/lib/App/Cpan.pm line 295 > App::Cpan::_process_options('App::Cpan') called at > C:/Perl/site/lib/App/ > Cpan.pm line 364 > App::Cpan::run('App::Cpan') called at C:\Perl\site\bin/cpan line 8 > > cpan> quit > Lockfile removed. > The other idea is to continue with " perl Build.PL" > C:\>perl Build.PL > Can't open perl script "Build.PL": No such file or directory > C:\> > Please let me know if i have to do sth else. > Thank you, > Labrini From Lizhe.Xu at aphis.usda.gov Fri Jan 6 11:42:16 2012 From: Lizhe.Xu at aphis.usda.gov (Xu, Lizhe - APHIS) Date: Fri, 6 Jan 2012 16:42:16 +0000 Subject: [Bioperl-l] Bio: :SeqIO::excel Message-ID: <866064B6DA2EBC4EA48F326E3CC15C072D8C91@001FSN2MPN1-021.001f.mgd2.msft.net> Thanks Kevin, I installed the package Spreadsheet::ParseExcel but got new error by running the script. BioPerl 1.6.1 on XP Script: #!C:\Perl\bin -w use Bio::Seq; use Bio::SeqIO; $file="PTV1Primers.xls"; $seqio_obj = Bio::SeqIO->new(-file => $file, -format => 'excel' ); while ($seq_obj = $seqio_obj->next_seq){ print $seq_obj->seq,"\n"; } ...... Error message: ========================================================= Replacement list is longer than search list at C:/Perl/site/lib/Bio/Range.pm line 251. UNIVERSAL->import is deprecated and will be removed in a future perl at C:/Perl/site/lib/Bio/Tree/TreeFunctionsI.pm line 94 Use of uninitialized value $slot in string eq at C:/Perl/site/lib/Bio/Seq/SeqBuilder.pm line 278. Use of uninitialized value $slot in substr at C:/Perl/site/lib/Bio/Seq/SeqBuilder.pm line 283. Use of uninitialized value $slot in concatenation (.) or string at C:/Perl/site/lib/Bio/Seq/SeqBuilder.pm line 283. Use of uninitialized value in print at ReadExcel.pl line 15. ========================================================== The first two row only displayed when I run the script under Perl v5.12.2. The rest showed on both Perl v5.12.2 and v5.8.8 (different machines) and repeated many times, I guess as many as the row numbers in Excel file. Please help me to fix the problem or point me a better way to read an excel file with two columns: one for name and one for sequence. Thank you very much. Lizhe From cjfields at illinois.edu Fri Jan 6 12:14:19 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 6 Jan 2012 17:14:19 +0000 Subject: [Bioperl-l] Bio: :SeqIO::excel In-Reply-To: <866064B6DA2EBC4EA48F326E3CC15C072D8C91@001FSN2MPN1-021.001f.mgd2.msft.net> References: <866064B6DA2EBC4EA48F326E3CC15C072D8C91@001FSN2MPN1-021.001f.mgd2.msft.net> Message-ID: <118F034CF4C3EF48A96F86CE585B94BF0C4E07D2@CHIMBX6.ad.uillinois.edu> Generally I would say that with any BioPerl module it's a good idea to start with the test suite, but unfortunately Bio::SeqIO::excel doesn't seem to have any test coverage. That'll have to be rectified... The 'replacement list' warning is harmless and is fixed in the latest BioPerl on CPAN. As for using Excel directly, an easy workaround is one could just as easily save this as tab- or comma-delimited format (which Excel can export) and then use something like perl's builtin split() function to create the sequence objects on the fly. If you are worried about CSV conversion issues, Text::CSV is a very good module and has an XS-based counterpart that is very fast (all well-documented). On the BioPerl end, creation of Bio::Seq is covered in the BioPerl HOWTOs. chris ________________________________________ From: bioperl-l-bounces at lists.open-bio.org [bioperl-l-bounces at lists.open-bio.org] on behalf of Xu, Lizhe - APHIS [Lizhe.Xu at aphis.usda.gov] Sent: Friday, January 06, 2012 10:42 AM To: bioperl-l at bioperl.org Subject: [Bioperl-l] Bio: :SeqIO::excel Thanks Kevin, I installed the package Spreadsheet::ParseExcel but got new error by running the script. BioPerl 1.6.1 on XP Script: #!C:\Perl\bin -w use Bio::Seq; use Bio::SeqIO; $file="PTV1Primers.xls"; $seqio_obj = Bio::SeqIO->new(-file => $file, -format => 'excel' ); while ($seq_obj = $seqio_obj->next_seq){ print $seq_obj->seq,"\n"; } ...... Error message: ========================================================= Replacement list is longer than search list at C:/Perl/site/lib/Bio/Range.pm line 251. UNIVERSAL->import is deprecated and will be removed in a future perl at C:/Perl/site/lib/Bio/Tree/TreeFunctionsI.pm line 94 Use of uninitialized value $slot in string eq at C:/Perl/site/lib/Bio/Seq/SeqBuilder.pm line 278. Use of uninitialized value $slot in substr at C:/Perl/site/lib/Bio/Seq/SeqBuilder.pm line 283. Use of uninitialized value $slot in concatenation (.) or string at C:/Perl/site/lib/Bio/Seq/SeqBuilder.pm line 283. Use of uninitialized value in print at ReadExcel.pl line 15. ========================================================== The first two row only displayed when I run the script under Perl v5.12.2. The rest showed on both Perl v5.12.2 and v5.8.8 (different machines) and repeated many times, I guess as many as the row numbers in Excel file. Please help me to fix the problem or point me a better way to read an excel file with two columns: one for name and one for sequence. Thank you very much. Lizhe _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Lizhe.Xu at aphis.usda.gov Mon Jan 9 09:40:21 2012 From: Lizhe.Xu at aphis.usda.gov (Xu, Lizhe - APHIS) Date: Mon, 9 Jan 2012 14:40:21 +0000 Subject: [Bioperl-l] reference array containing Bio::seq objects Message-ID: <866064B6DA2EBC4EA48F326E3CC15C072D8E07@001FSN2MPN1-021.001f.mgd2.msft.net> I created a reference array which contains 26 Bio::seq objects: push(@primer, \Bio::Seq->new(-seq => $primerSeq, -display_id => $primerName, -desc => chop($primerName), -alphabet => 'dna' )) }; I checked the contents of @primer by ${$primer[$index]}-> and get the primer sequence, id and desc for these 26 primers. However, there are two things bother me and I want to learn why I get this kind of results: (1) length (@primer) gives me 2, instead of 26. (2) the last char of $primerName is "F" or "R" , which gives the direction of the primer. ${$primer[$index]}->desc gives the direction info correctly. However, ${$primer[$index]}->id gives the primer name without the last char of the direction info. I thought the $primerName is first assigned to -display_id before chop() action and it should contains the whole name with the last char of direction. Certainly I'm wrong, but I'd like to know why so I can avoid the similar error later. Thank you very much. Lizhe From fs5 at sanger.ac.uk Mon Jan 9 12:10:37 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 09 Jan 2012 17:10:37 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> Hi all, I needed to manipulate Bio::Seq objects with annotations and sequence features to simulate molecular cloning techniques, e.g. to cut a vector and insert a fragment into it while preserving all the annotations and moving the features accordingly. My main aim was to split features that span deletion/insertion sites in a meaningful way, which can not be done with the currently availble methods. I have modified Bio::SeqUtils so that I have the following new methods: delete ====== removes a segment from a sequence object and adjusts positions and types of locations of sequence features: - locations of features that span the deletion sites are turned into Splits. - locations that extend into the deleted region are turned to Fuzzy to indicate that their true start/end was lost. - locations contained inside the deleted regions are lost. - other features are shifted according to the length of the deletion. insert ====== adds a Bio::Seq object into another one between specified insertion sites. This also affects the features on the recipient sequence: - locations of features that span the insertion site are split but position types are not turned to Fuzzy because no part of the original feature is lost. - other features are shifted according to the length of the insertion. ligate ====== just for convenience. Supply a recipient, a fragment and one or two sites to cut the recipient. Can also flip the fragment if required. Simply calls delete [, reverse_complement_with_features] and insert in turn. One situation I haven't handled yet is a deletion that spans the origin of a circular molecule but that should be a rare thing to do anyway. The code currently throws an error if this is attempted. I'm happy to contribute the code on Github if there is interest? Comments on the handling of feature locations highly welcome! Frank -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Mon Jan 9 13:29:44 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Mon, 9 Jan 2012 18:29:44 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: Sounds very promising! The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules). chris On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: > Hi all, > > I needed to manipulate Bio::Seq objects with annotations and sequence > features to simulate molecular cloning techniques, e.g. to cut a vector > and insert a fragment into it while preserving all the annotations and > moving the features accordingly. > My main aim was to split features that span deletion/insertion sites in > a meaningful way, which can not be done with the currently availble > methods. > I have modified Bio::SeqUtils so that I have the following new methods: > > delete > ====== > removes a segment from a sequence object and adjusts positions and types > of locations of sequence features: > - locations of features that span the deletion sites are turned into > Splits. > - locations that extend into the deleted region are turned to Fuzzy to > indicate that their true start/end was lost. > - locations contained inside the deleted regions are lost. > - other features are shifted according to the length of the deletion. > > insert > ====== > adds a Bio::Seq object into another one between specified insertion > sites. This also affects the features on the recipient sequence: > - locations of features that span the insertion site are split but > position types are not turned to Fuzzy because no part of the original > feature is lost. > - other features are shifted according to the length of the insertion. > > ligate > ====== > just for convenience. Supply a recipient, a fragment and one or two > sites to cut the recipient. Can also flip the fragment if required. > Simply calls delete [, reverse_complement_with_features] and insert in > turn. > > > One situation I haven't handled yet is a deletion that spans the origin > of a circular molecule but that should be a rare thing to do anyway. The > code currently throws an error if this is attempted. > > I'm happy to contribute the code on Github if there is interest? > Comments on the handling of feature locations highly welcome! > > Frank From fs5 at sanger.ac.uk Tue Jan 10 08:10:10 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 10 Jan 2012 13:10:10 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> Hi Chris, I have made the changes in a Git fork and made the pull request now. If this is accepted into BioPerl I can also write a little SeqUtils HOWTO for the BioPerl wiki. Frank On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: > Sounds very promising! The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules). > > chris > > On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: > > > Hi all, > > > > I needed to manipulate Bio::Seq objects with annotations and sequence > > features to simulate molecular cloning techniques, e.g. to cut a vector > > and insert a fragment into it while preserving all the annotations and > > moving the features accordingly. > > My main aim was to split features that span deletion/insertion sites in > > a meaningful way, which can not be done with the currently availble > > methods. > > I have modified Bio::SeqUtils so that I have the following new methods: > > > > delete > > ====== > > removes a segment from a sequence object and adjusts positions and types > > of locations of sequence features: > > - locations of features that span the deletion sites are turned into > > Splits. > > - locations that extend into the deleted region are turned to Fuzzy to > > indicate that their true start/end was lost. > > - locations contained inside the deleted regions are lost. > > - other features are shifted according to the length of the deletion. > > > > insert > > ====== > > adds a Bio::Seq object into another one between specified insertion > > sites. This also affects the features on the recipient sequence: > > - locations of features that span the insertion site are split but > > position types are not turned to Fuzzy because no part of the original > > feature is lost. > > - other features are shifted according to the length of the insertion. > > > > ligate > > ====== > > just for convenience. Supply a recipient, a fragment and one or two > > sites to cut the recipient. Can also flip the fragment if required. > > Simply calls delete [, reverse_complement_with_features] and insert in > > turn. > > > > > > One situation I haven't handled yet is a deletion that spans the origin > > of a circular molecule but that should be a rare thing to do anyway. The > > code currently throws an error if this is attempted. > > > > I'm happy to contribute the code on Github if there is interest? > > Comments on the handling of feature locations highly welcome! > > > > Frank > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From roy.chaudhuri at gmail.com Tue Jan 10 10:25:09 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Jan 2012 15:25:09 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <4F0C5855.1070907@gmail.com> Hi Frank, Looks good to me. One thing I'm not sure about - why do features overlapping a deletion become fuzzy? That behaviour is in trunc_with_features because it's intended to represent a taking a subregion of a larger sequence, but if you're representing an internal deletion then the boundaries of the overlapping feature aren't unknown, they have been specifically altered. Maybe you could give absolute coordinates, but add a note indicating that the 5' or 3' end has been truncated by however many bases. Cheers, Roy. On 10/01/2012 13:10, Frank Schwach wrote: > Hi Chris, > > I have made the changes in a Git fork and made the pull request now. > If this is accepted into BioPerl I can also write a little SeqUtils > HOWTO for the BioPerl wiki. > > Frank > > > On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >> Sounds very promising! The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules). >> >> chris >> >> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >> >>> Hi all, >>> >>> I needed to manipulate Bio::Seq objects with annotations and sequence >>> features to simulate molecular cloning techniques, e.g. to cut a vector >>> and insert a fragment into it while preserving all the annotations and >>> moving the features accordingly. >>> My main aim was to split features that span deletion/insertion sites in >>> a meaningful way, which can not be done with the currently availble >>> methods. >>> I have modified Bio::SeqUtils so that I have the following new methods: >>> >>> delete >>> ====== >>> removes a segment from a sequence object and adjusts positions and types >>> of locations of sequence features: >>> - locations of features that span the deletion sites are turned into >>> Splits. >>> - locations that extend into the deleted region are turned to Fuzzy to >>> indicate that their true start/end was lost. >>> - locations contained inside the deleted regions are lost. >>> - other features are shifted according to the length of the deletion. >>> >>> insert >>> ====== >>> adds a Bio::Seq object into another one between specified insertion >>> sites. This also affects the features on the recipient sequence: >>> - locations of features that span the insertion site are split but >>> position types are not turned to Fuzzy because no part of the original >>> feature is lost. >>> - other features are shifted according to the length of the insertion. >>> >>> ligate >>> ====== >>> just for convenience. Supply a recipient, a fragment and one or two >>> sites to cut the recipient. Can also flip the fragment if required. >>> Simply calls delete [, reverse_complement_with_features] and insert in >>> turn. >>> >>> >>> One situation I haven't handled yet is a deletion that spans the origin >>> of a circular molecule but that should be a rare thing to do anyway. The >>> code currently throws an error if this is attempted. >>> >>> I'm happy to contribute the code on Github if there is interest? >>> Comments on the handling of feature locations highly welcome! >>> >>> Frank >> >> >> >> > > > From fs5 at sanger.ac.uk Tue Jan 10 11:47:41 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 10 Jan 2012 16:47:41 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F0C5855.1070907@gmail.com> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> Message-ID: <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> Hi Roy, Sorry, I hadn't explained that very well: it's not the outer boundaries of the feature that become fuzzy but the "inner" ones of the split locations: -------------------- a feature's location ==========xxxx================= sequence --------- sublocation 1 -------- sublocation 2 =============================== x= sequence to delete The feature's location has changed from Simple to Split. Sublocation 1: start is still EXACT and has not changed end is now AFTER because this is not a true end of the feature Sublocation 2: start is BEFORE end is EXACT (but shifted) I hope this makes more sense(?) Cheers, Frank On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: > Hi Frank, > > Looks good to me. One thing I'm not sure about - why do features > overlapping a deletion become fuzzy? That behaviour is in > trunc_with_features because it's intended to represent a taking a > subregion of a larger sequence, but if you're representing an internal > deletion then the boundaries of the overlapping feature aren't unknown, > they have been specifically altered. Maybe you could give absolute > coordinates, but add a note indicating that the 5' or 3' end has been > truncated by however many bases. > > Cheers, > Roy. > > On 10/01/2012 13:10, Frank Schwach wrote: > > Hi Chris, > > > > I have made the changes in a Git fork and made the pull request now. > > If this is accepted into BioPerl I can also write a little SeqUtils > > HOWTO for the BioPerl wiki. > > > > Frank > > > > > > On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: > >> Sounds very promising! The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules). > >> > >> chris > >> > >> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: > >> > >>> Hi all, > >>> > >>> I needed to manipulate Bio::Seq objects with annotations and sequence > >>> features to simulate molecular cloning techniques, e.g. to cut a vector > >>> and insert a fragment into it while preserving all the annotations and > >>> moving the features accordingly. > >>> My main aim was to split features that span deletion/insertion sites in > >>> a meaningful way, which can not be done with the currently availble > >>> methods. > >>> I have modified Bio::SeqUtils so that I have the following new methods: > >>> > >>> delete > >>> ====== > >>> removes a segment from a sequence object and adjusts positions and types > >>> of locations of sequence features: > >>> - locations of features that span the deletion sites are turned into > >>> Splits. > >>> - locations that extend into the deleted region are turned to Fuzzy to > >>> indicate that their true start/end was lost. > >>> - locations contained inside the deleted regions are lost. > >>> - other features are shifted according to the length of the deletion. > >>> > >>> insert > >>> ====== > >>> adds a Bio::Seq object into another one between specified insertion > >>> sites. This also affects the features on the recipient sequence: > >>> - locations of features that span the insertion site are split but > >>> position types are not turned to Fuzzy because no part of the original > >>> feature is lost. > >>> - other features are shifted according to the length of the insertion. > >>> > >>> ligate > >>> ====== > >>> just for convenience. Supply a recipient, a fragment and one or two > >>> sites to cut the recipient. Can also flip the fragment if required. > >>> Simply calls delete [, reverse_complement_with_features] and insert in > >>> turn. > >>> > >>> > >>> One situation I haven't handled yet is a deletion that spans the origin > >>> of a circular molecule but that should be a rare thing to do anyway. The > >>> code currently throws an error if this is attempted. > >>> > >>> I'm happy to contribute the code on Github if there is interest? > >>> Comments on the handling of feature locations highly welcome! > >>> > >>> Frank > >> > >> > >> > >> > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From roy.chaudhuri at gmail.com Tue Jan 10 12:27:05 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Jan 2012 17:27:05 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <4F0C74E9.6020804@gmail.com> I think it's me that didn't explain very well - I was talking about overlapping (rather than spanning) a deletion, although I think the same principle applies to the spanning example you gave. Here's some test code: #!/usr/bin/perl use warnings FATAL=>qw(all); use strict; use Bio::Seq; use Bio::SeqIO; use Bio::SeqUtils; use Bio::SeqFeature::Generic; my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', -start=>2, -end=>9)); $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', -start=>2, -end=>5)); my $out=Bio::SeqIO->newFh(-format=>'genbank'); my $trunc=Bio::SeqUtils->delete($seq, 4, 6); print $out $trunc; This currently outputs: LOCUS seq-accession_number 7 bp dna linear UNK ACCESSION unknown FEATURES Location/Qualifiers CDS join(2..>3,<4..6) CDS 2..>3 ORIGIN 1 aaaaaaa // However, I was suggesting that the feature table should be something like: CDS join(2..3,4..6) /note="3 bp internal deletion" CDS join(2..3) /note="2 bp deleted from 3' end" Fuzzy locations are intended to represent features which have boundaries spanning outside of the sequence. For a defined deletion that's not the case, the boundaries of the feature aren't unknown, they have been specifically altered. Hope this is clearer. Cheers, Roy. On 10/01/2012 16:47, Frank Schwach wrote: > Hi Roy, > > Sorry, I hadn't explained that very well: it's not the outer boundaries > of the feature that become fuzzy but the "inner" ones of the split > locations: > > -------------------- a feature's location > ==========xxxx================= sequence > > > --------- sublocation 1 > -------- sublocation 2 > =============================== > > x= sequence to delete > The feature's location has changed from Simple to Split. > > Sublocation 1: > start is still EXACT and has not changed > end is now AFTER because this is not a true end of the feature > > Sublocation 2: > start is BEFORE > end is EXACT (but shifted) > > I hope this makes more sense(?) > > Cheers, > > Frank > > > > On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >> Hi Frank, >> >> Looks good to me. One thing I'm not sure about - why do features >> overlapping a deletion become fuzzy? That behaviour is in >> trunc_with_features because it's intended to represent a taking a >> subregion of a larger sequence, but if you're representing an internal >> deletion then the boundaries of the overlapping feature aren't unknown, >> they have been specifically altered. Maybe you could give absolute >> coordinates, but add a note indicating that the 5' or 3' end has been >> truncated by however many bases. >> >> Cheers, >> Roy. >> >> On 10/01/2012 13:10, Frank Schwach wrote: >>> Hi Chris, >>> >>> I have made the changes in a Git fork and made the pull request now. >>> If this is accepted into BioPerl I can also write a little SeqUtils >>> HOWTO for the BioPerl wiki. >>> >>> Frank >>> >>> >>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>> Sounds very promising! The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules). >>>> >>>> chris >>>> >>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>> >>>>> Hi all, >>>>> >>>>> I needed to manipulate Bio::Seq objects with annotations and sequence >>>>> features to simulate molecular cloning techniques, e.g. to cut a vector >>>>> and insert a fragment into it while preserving all the annotations and >>>>> moving the features accordingly. >>>>> My main aim was to split features that span deletion/insertion sites in >>>>> a meaningful way, which can not be done with the currently availble >>>>> methods. >>>>> I have modified Bio::SeqUtils so that I have the following new methods: >>>>> >>>>> delete >>>>> ====== >>>>> removes a segment from a sequence object and adjusts positions and types >>>>> of locations of sequence features: >>>>> - locations of features that span the deletion sites are turned into >>>>> Splits. >>>>> - locations that extend into the deleted region are turned to Fuzzy to >>>>> indicate that their true start/end was lost. >>>>> - locations contained inside the deleted regions are lost. >>>>> - other features are shifted according to the length of the deletion. >>>>> >>>>> insert >>>>> ====== >>>>> adds a Bio::Seq object into another one between specified insertion >>>>> sites. This also affects the features on the recipient sequence: >>>>> - locations of features that span the insertion site are split but >>>>> position types are not turned to Fuzzy because no part of the original >>>>> feature is lost. >>>>> - other features are shifted according to the length of the insertion. >>>>> >>>>> ligate >>>>> ====== >>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>> Simply calls delete [, reverse_complement_with_features] and insert in >>>>> turn. >>>>> >>>>> >>>>> One situation I haven't handled yet is a deletion that spans the origin >>>>> of a circular molecule but that should be a rare thing to do anyway. The >>>>> code currently throws an error if this is attempted. >>>>> >>>>> I'm happy to contribute the code on Github if there is interest? >>>>> Comments on the handling of feature locations highly welcome! >>>>> >>>>> Frank >>>> >>>> >>>> >>>> >>> >>> >>> >> > > > From roy.chaudhuri at gmail.com Tue Jan 10 12:31:06 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Jan 2012 17:31:06 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F0C74E9.6020804@gmail.com> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> Message-ID: <4F0C75DA.9050607@gmail.com> Or without the typo: CDS join(2..3,4..6) /note="3 bp internal deletion" CDS 2..3 /note="2 bp deleted from 3' end" On 10/01/2012 17:27, Roy Chaudhuri wrote: > I think it's me that didn't explain very well - I was talking about > overlapping (rather than spanning) a deletion, although I think the same > principle applies to the spanning example you gave. Here's some test code: > > #!/usr/bin/perl > use warnings FATAL=>qw(all); > use strict; > use Bio::Seq; > use Bio::SeqIO; > use Bio::SeqUtils; > use Bio::SeqFeature::Generic; > my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); > $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > -start=>2, > -end=>9)); > > $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > -start=>2, > -end=>5)); > my $out=Bio::SeqIO->newFh(-format=>'genbank'); > my $trunc=Bio::SeqUtils->delete($seq, 4, 6); > print $out $trunc; > > > This currently outputs: > LOCUS seq-accession_number 7 bp dna linear UNK > ACCESSION unknown > FEATURES Location/Qualifiers > CDS join(2..>3,<4..6) > CDS 2..>3 > ORIGIN > 1 aaaaaaa > // > > However, I was suggesting that the feature table should be something like: > CDS join(2..3,4..6) > /note="3 bp internal deletion" > CDS join(2..3) > /note="2 bp deleted from 3' end" > > Fuzzy locations are intended to represent features which have boundaries > spanning outside of the sequence. For a defined deletion that's not the > case, the boundaries of the feature aren't unknown, they have been > specifically altered. > > Hope this is clearer. > Cheers, > Roy. > > On 10/01/2012 16:47, Frank Schwach wrote: >> Hi Roy, >> >> Sorry, I hadn't explained that very well: it's not the outer boundaries >> of the feature that become fuzzy but the "inner" ones of the split >> locations: >> >> -------------------- a feature's location >> ==========xxxx================= sequence >> >> >> --------- sublocation 1 >> -------- sublocation 2 >> =============================== >> >> x= sequence to delete >> The feature's location has changed from Simple to Split. >> >> Sublocation 1: >> start is still EXACT and has not changed >> end is now AFTER because this is not a true end of the feature >> >> Sublocation 2: >> start is BEFORE >> end is EXACT (but shifted) >> >> I hope this makes more sense(?) >> >> Cheers, >> >> Frank >> >> >> >> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>> Hi Frank, >>> >>> Looks good to me. One thing I'm not sure about - why do features >>> overlapping a deletion become fuzzy? That behaviour is in >>> trunc_with_features because it's intended to represent a taking a >>> subregion of a larger sequence, but if you're representing an internal >>> deletion then the boundaries of the overlapping feature aren't unknown, >>> they have been specifically altered. Maybe you could give absolute >>> coordinates, but add a note indicating that the 5' or 3' end has been >>> truncated by however many bases. >>> >>> Cheers, >>> Roy. >>> >>> On 10/01/2012 13:10, Frank Schwach wrote: >>>> Hi Chris, >>>> >>>> I have made the changes in a Git fork and made the pull request now. >>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>> HOWTO for the BioPerl wiki. >>>> >>>> Frank >>>> >>>> >>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>> Sounds very promising! The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules). >>>>> >>>>> chris >>>>> >>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I needed to manipulate Bio::Seq objects with annotations and sequence >>>>>> features to simulate molecular cloning techniques, e.g. to cut a vector >>>>>> and insert a fragment into it while preserving all the annotations and >>>>>> moving the features accordingly. >>>>>> My main aim was to split features that span deletion/insertion sites in >>>>>> a meaningful way, which can not be done with the currently availble >>>>>> methods. >>>>>> I have modified Bio::SeqUtils so that I have the following new methods: >>>>>> >>>>>> delete >>>>>> ====== >>>>>> removes a segment from a sequence object and adjusts positions and types >>>>>> of locations of sequence features: >>>>>> - locations of features that span the deletion sites are turned into >>>>>> Splits. >>>>>> - locations that extend into the deleted region are turned to Fuzzy to >>>>>> indicate that their true start/end was lost. >>>>>> - locations contained inside the deleted regions are lost. >>>>>> - other features are shifted according to the length of the deletion. >>>>>> >>>>>> insert >>>>>> ====== >>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>> sites. This also affects the features on the recipient sequence: >>>>>> - locations of features that span the insertion site are split but >>>>>> position types are not turned to Fuzzy because no part of the original >>>>>> feature is lost. >>>>>> - other features are shifted according to the length of the insertion. >>>>>> >>>>>> ligate >>>>>> ====== >>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>> Simply calls delete [, reverse_complement_with_features] and insert in >>>>>> turn. >>>>>> >>>>>> >>>>>> One situation I haven't handled yet is a deletion that spans the origin >>>>>> of a circular molecule but that should be a rare thing to do anyway. The >>>>>> code currently throws an error if this is attempted. >>>>>> >>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>> Comments on the handling of feature locations highly welcome! >>>>>> >>>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> > From michael.stoker at rocketmail.com Sun Jan 8 07:27:00 2012 From: michael.stoker at rocketmail.com (Michael Stoker) Date: Sun, 8 Jan 2012 04:27:00 -0800 (PST) Subject: [Bioperl-l] Installing Bioperl on a webserver Message-ID: <1326025620.6623.YahooMailNeo@web44711.mail.sp1.yahoo.com> Hi, I'm trying to install bioperl on my webserver but i'm facing some difficulties to install it. I created on the root of my server this folder /www/perl where I put all the perl modules. I followed the online instruction for bioperl : -cd /www/perl -wget http://bioperl.org/DIST/BioPerl-1.6.0.tar.gz -tar xzvf BioPerl-1.6.0.tar.gz -cd BioPerl-1.6.0/ -perl Build.PL - ./Build test - ./Build install but i get an error : ERROR: Can't create '/usr/local/bin' Do not have write permissions on '/usr/local/bin' And I can't modify the permissions on my server. I need bioperl for blast and gbrowse. Any advice would be really helpful on how to install bioperl on a webserver and how my blast and gbrowse can find it? Kind regards. Michael. From florian.lajus at inria.fr Tue Jan 10 12:06:31 2012 From: florian.lajus at inria.fr (lajus) Date: Tue, 10 Jan 2012 18:06:31 +0100 Subject: [Bioperl-l] Question on SeqFeature_RelationShip Message-ID: <4F0C7017.90803@inria.fr> Hello, I am currently working on a refactoring of the Genolevures project (http://www.genolevures.org/) We are trying to better use bioperl and the bioSQL shema on a postgreSQL database. I have loaded an EMBL file into my BioSQL database (postgres). If I look in my database, my bioentry have been added and seqFeatures associated too. But it seems that my seqfeature_relationship table is empty. I find it strange in so far as there is a relationship between gene and its CDS. right? example: (extract of my EMBL file) FT gene <30978..>32507 FT /locus_tag="CAGL0A00319g" FT /old_locus_tag="CAGL-IPF3315" FT /old_locus_tag="CAGL-CDS2015.1" FT CDS 30978..32507 FT /locus_tag="CAGL0A00319g" FT /old_locus_tag="CAGL-IPF3315" FT /old_locus_tag="CAGL-CDS2015.1" FT /note="similar to uniprot|P16639 Saccharomyces cerevisiae FT YGL017w ATE1 arginyl tRNA transferase" FT /db_xref="GOA:Q6FXV3" FT /db_xref="InterPro:IPR007471" FT /db_xref="InterPro:IPR007472" FT /db_xref="InterPro:IPR016181" FT /db_xref="InterPro:IPR017137" FT /db_xref="UniProtKB/TrEMBL:Q6FXV3" FT /inference="similar to AA sequence:UniProtKB:P16639" FT /protein_id="CAG57677.1" FT /translation="MENKLIIHRPLYFSDKSDCGYCHGKKAKGSDFYSLESWYERIKEN FT ADSEVEELPVRSCTVGFQCENMTVAMYDQMCNMGFRRSGLFVYKMDALRSCCRLYTIRT FT RPDWFKLTKDMRKCINRFRKHVLGEPVANAKTQGYVEDIVDIEGQSDSINFYTRFGPAV FT YTDEKYELFSIYQERVHQDFDHSKKGFKRFLCDAPFTQGVIMGTEEEWEQLNNWKSMKP FT GERLLRTGPVHESYYYKGKLIALAVTDFLPSGISSVYFIWHPDYHKWSLGKLSALRELS FT LVSKTNLKYYYLGYYIDDCKKMNYKANYGGELLDSCTERYFKLSQVKDMIRGGKLFMVG FT TQGHDISREVALSDAIRDCIYQTDAFDIASDDNVAEKVYGTSSNIYRPQYLKEVISFLK FT TSGLEYDFPIYNDGVFNQYAKRIAKDGEDPDFTIPSICPGLIPLWELKDLLMSGKLQKE FT LTGRTLVFDTSFGFIRKLEPWEDEDSTTKTAICDVVRLLGLEMASNSIVVV" you can find the entire EMBL file here : http://www.ebi.ac.uk/htbin/expasyfetch?CR380947 For us, it exists a relationship between the gene and its CDS. So why is the table seqfeature_relationship table empty? Have we missed something? PS: Sorry for my bad english... Lajus Florian. From roy.chaudhuri at gmail.com Tue Jan 10 12:57:41 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 10 Jan 2012 17:57:41 +0000 Subject: [Bioperl-l] Installing Bioperl on a webserver In-Reply-To: <1326025620.6623.YahooMailNeo@web44711.mail.sp1.yahoo.com> References: <1326025620.6623.YahooMailNeo@web44711.mail.sp1.yahoo.com> Message-ID: <4F0C7C15.2080203@gmail.com> Hi Michael, Since you don't have root access, you need to follow the instructions here: http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_IN_A_PERSONAL_MODULE_AREA Roy. On 08/01/2012 12:27, Michael Stoker wrote: > Hi, > > I'm trying to install bioperl on my webserver but i'm facing some difficulties to install it. > I created on the root of my server this folder /www/perl where I put all the perl modules. > I followed the online instruction for bioperl : > -cd /www/perl > > -wget http://bioperl.org/DIST/BioPerl-1.6.0.tar.gz > -tar xzvf BioPerl-1.6.0.tar.gz > > -cd BioPerl-1.6.0/ > -perl Build.PL > > - ./Build test > - ./Build install > > but i get an error : > ERROR: Can't create '/usr/local/bin' > Do not have write permissions on '/usr/local/bin' > > > And I can't modify the permissions on my server. > > I need bioperl for blast and gbrowse. > > Any advice would be really helpful on how to install bioperl on a webserver and how my blast and gbrowse can find it? > > Kind regards. > > Michael. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Tue Jan 10 13:18:43 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 10 Jan 2012 18:18:43 +0000 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <4F0C7017.90803@inria.fr> References: <4F0C7017.90803@inria.fr> Message-ID: On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: > Hello, > I am currently working on a refactoring of the Genolevures project > (http://www.genolevures.org/) > We are trying to better use bioperl and the bioSQL shema on a postgreSQL > database. > > I have loaded an EMBL file into my BioSQL database (postgres). If I look in > my database, my bioentry have been added and seqFeatures associated too. > But it seems that my seqfeature_relationship table is empty. > I find it strange in so far as there is a relationship between gene and its > CDS. right? No, not explicitly. Unlike GFF3 where there can be (and should be) explicit parent/child links between the gene and CDS, in GenBank and EMBL feature tables this is implicit only. I don't know if BioPerl attempts to infer this kind of relationship, and if it did, if that would get record in the BioSQL tables. Peter From cjfields at illinois.edu Tue Jan 10 13:45:22 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 10 Jan 2012 18:45:22 +0000 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: References: <4F0C7017.90803@inria.fr> Message-ID: On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: > On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >> Hello, >> I am currently working on a refactoring of the Genolevures project >> (http://www.genolevures.org/) >> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >> database. >> >> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >> my database, my bioentry have been added and seqFeatures associated too. >> But it seems that my seqfeature_relationship table is empty. >> I find it strange in so far as there is a relationship between gene and its >> CDS. right? > > No, not explicitly. Unlike GFF3 where there can be (and should be) > explicit parent/child links between the gene and CDS, in GenBank > and EMBL feature tables this is implicit only. I don't know if BioPerl > attempts to infer this kind of relationship, and if it did, if that would > get record in the BioSQL tables. > > Peter BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener chris From fs5 at sanger.ac.uk Tue Jan 10 17:35:46 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 10 Jan 2012 22:35:46 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F0C74E9.6020804@gmail.com> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> Message-ID: <4F0CBD42.6080608@sanger.ac.uk> Hi Roy, I see what you mean and I had the same thought but somehow I liked the fuzzy locations more because it suggests to me that the feature is not complete (anymore). But I do take your point that this is not the intended use of this location type. I can add notes as you suggest but I guess I should also add a misc_feature "deletion", in your example between bases 3 and 4, to make it clearer that something has happened to the feature. Frank On 10/01/12 17:27, Roy Chaudhuri wrote: > I think it's me that didn't explain very well - I was talking about > overlapping (rather than spanning) a deletion, although I think the > same principle applies to the spanning example you gave. Here's some > test code: > > #!/usr/bin/perl > use warnings FATAL=>qw(all); > use strict; > use Bio::Seq; > use Bio::SeqIO; > use Bio::SeqUtils; > use Bio::SeqFeature::Generic; > my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); > $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > -start=>2, > -end=>9)); > > $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > -start=>2, > -end=>5)); > my $out=Bio::SeqIO->newFh(-format=>'genbank'); > my $trunc=Bio::SeqUtils->delete($seq, 4, 6); > print $out $trunc; > > > This currently outputs: > LOCUS seq-accession_number 7 bp dna linear UNK > ACCESSION unknown > FEATURES Location/Qualifiers > CDS join(2..>3,<4..6) > CDS 2..>3 > ORIGIN > 1 aaaaaaa > // > > However, I was suggesting that the feature table should be something > like: > CDS join(2..3,4..6) > /note="3 bp internal deletion" > CDS join(2..3) > /note="2 bp deleted from 3' end" > > Fuzzy locations are intended to represent features which have > boundaries spanning outside of the sequence. For a defined deletion > that's not the case, the boundaries of the feature aren't unknown, > they have been specifically altered. > > Hope this is clearer. > Cheers, > Roy. > > On 10/01/2012 16:47, Frank Schwach wrote: >> Hi Roy, >> >> Sorry, I hadn't explained that very well: it's not the outer boundaries >> of the feature that become fuzzy but the "inner" ones of the split >> locations: >> >> -------------------- a feature's location >> ==========xxxx================= sequence >> >> >> --------- sublocation 1 >> -------- sublocation 2 >> =============================== >> >> x= sequence to delete >> The feature's location has changed from Simple to Split. >> >> Sublocation 1: >> start is still EXACT and has not changed >> end is now AFTER because this is not a true end of the feature >> >> Sublocation 2: >> start is BEFORE >> end is EXACT (but shifted) >> >> I hope this makes more sense(?) >> >> Cheers, >> >> Frank >> >> >> >> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>> Hi Frank, >>> >>> Looks good to me. One thing I'm not sure about - why do features >>> overlapping a deletion become fuzzy? That behaviour is in >>> trunc_with_features because it's intended to represent a taking a >>> subregion of a larger sequence, but if you're representing an internal >>> deletion then the boundaries of the overlapping feature aren't unknown, >>> they have been specifically altered. Maybe you could give absolute >>> coordinates, but add a note indicating that the 5' or 3' end has been >>> truncated by however many bases. >>> >>> Cheers, >>> Roy. >>> >>> On 10/01/2012 13:10, Frank Schwach wrote: >>>> Hi Chris, >>>> >>>> I have made the changes in a Git fork and made the pull request now. >>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>> HOWTO for the BioPerl wiki. >>>> >>>> Frank >>>> >>>> >>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>> Sounds very promising! The easiest way to contribute is via a >>>>> fork of the code on Github with a pull request (as you already >>>>> know, being a contributor to the Primer3 modules). >>>>> >>>>> chris >>>>> >>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>> >>>>>> Hi all, >>>>>> >>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>> sequence >>>>>> features to simulate molecular cloning techniques, e.g. to cut a >>>>>> vector >>>>>> and insert a fragment into it while preserving all the >>>>>> annotations and >>>>>> moving the features accordingly. >>>>>> My main aim was to split features that span deletion/insertion >>>>>> sites in >>>>>> a meaningful way, which can not be done with the currently availble >>>>>> methods. >>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>> methods: >>>>>> >>>>>> delete >>>>>> ====== >>>>>> removes a segment from a sequence object and adjusts positions >>>>>> and types >>>>>> of locations of sequence features: >>>>>> - locations of features that span the deletion sites are turned into >>>>>> Splits. >>>>>> - locations that extend into the deleted region are turned to >>>>>> Fuzzy to >>>>>> indicate that their true start/end was lost. >>>>>> - locations contained inside the deleted regions are lost. >>>>>> - other features are shifted according to the length of the >>>>>> deletion. >>>>>> >>>>>> insert >>>>>> ====== >>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>> sites. This also affects the features on the recipient sequence: >>>>>> - locations of features that span the insertion site are split but >>>>>> position types are not turned to Fuzzy because no part of the >>>>>> original >>>>>> feature is lost. >>>>>> - other features are shifted according to the length of the >>>>>> insertion. >>>>>> >>>>>> ligate >>>>>> ====== >>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>> insert in >>>>>> turn. >>>>>> >>>>>> >>>>>> One situation I haven't handled yet is a deletion that spans the >>>>>> origin >>>>>> of a circular molecule but that should be a rare thing to do >>>>>> anyway. The >>>>>> code currently throws an error if this is attempted. >>>>>> >>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>> Comments on the handling of feature locations highly welcome! >>>>>> >>>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >> >> >> > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Tue Jan 10 22:13:45 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Jan 2012 03:13:45 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F0CBD42.6080608@sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0CBD42.6080608@sanger.ac.uk> Message-ID: Have to agree with Roy, in cases where the bounds of the deletion are defined, the truncated features wouldn't be fuzzy (e.g. the start and end would be known, the feature would just be truncated). Same with other mutations. chris On Jan 10, 2012, at 4:35 PM, Frank Schwach wrote: > Hi Roy, > > I see what you mean and I had the same thought but somehow I liked the fuzzy locations more because it suggests to me that the feature is not complete (anymore). But I do take your point that this is not the intended use of this location type. I can add notes as you suggest but I guess I should also add a misc_feature "deletion", in your example between bases 3 and 4, to make it clearer that something has happened to the feature. > > Frank > > > > On 10/01/12 17:27, Roy Chaudhuri wrote: >> I think it's me that didn't explain very well - I was talking about overlapping (rather than spanning) a deletion, although I think the same principle applies to the spanning example you gave. Here's some test code: >> >> #!/usr/bin/perl >> use warnings FATAL=>qw(all); >> use strict; >> use Bio::Seq; >> use Bio::SeqIO; >> use Bio::SeqUtils; >> use Bio::SeqFeature::Generic; >> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >> -start=>2, >> -end=>9)); >> >> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >> -start=>2, >> -end=>5)); >> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >> print $out $trunc; >> >> >> This currently outputs: >> LOCUS seq-accession_number 7 bp dna linear UNK >> ACCESSION unknown >> FEATURES Location/Qualifiers >> CDS join(2..>3,<4..6) >> CDS 2..>3 >> ORIGIN >> 1 aaaaaaa >> // >> >> However, I was suggesting that the feature table should be something like: >> CDS join(2..3,4..6) >> /note="3 bp internal deletion" >> CDS join(2..3) >> /note="2 bp deleted from 3' end" >> >> Fuzzy locations are intended to represent features which have boundaries spanning outside of the sequence. For a defined deletion that's not the case, the boundaries of the feature aren't unknown, they have been specifically altered. >> >> Hope this is clearer. >> Cheers, >> Roy. >> >> On 10/01/2012 16:47, Frank Schwach wrote: >>> Hi Roy, >>> >>> Sorry, I hadn't explained that very well: it's not the outer boundaries >>> of the feature that become fuzzy but the "inner" ones of the split >>> locations: >>> >>> -------------------- a feature's location >>> ==========xxxx================= sequence >>> >>> >>> --------- sublocation 1 >>> -------- sublocation 2 >>> =============================== >>> >>> x= sequence to delete >>> The feature's location has changed from Simple to Split. >>> >>> Sublocation 1: >>> start is still EXACT and has not changed >>> end is now AFTER because this is not a true end of the feature >>> >>> Sublocation 2: >>> start is BEFORE >>> end is EXACT (but shifted) >>> >>> I hope this makes more sense(?) >>> >>> Cheers, >>> >>> Frank >>> >>> >>> >>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>> Hi Frank, >>>> >>>> Looks good to me. One thing I'm not sure about - why do features >>>> overlapping a deletion become fuzzy? That behaviour is in >>>> trunc_with_features because it's intended to represent a taking a >>>> subregion of a larger sequence, but if you're representing an internal >>>> deletion then the boundaries of the overlapping feature aren't unknown, >>>> they have been specifically altered. Maybe you could give absolute >>>> coordinates, but add a note indicating that the 5' or 3' end has been >>>> truncated by however many bases. >>>> >>>> Cheers, >>>> Roy. >>>> >>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>> Hi Chris, >>>>> >>>>> I have made the changes in a Git fork and made the pull request now. >>>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>>> HOWTO for the BioPerl wiki. >>>>> >>>>> Frank >>>>> >>>>> >>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>> Sounds very promising! The easiest way to contribute is via a fork of the code on Github with a pull request (as you already know, being a contributor to the Primer3 modules). >>>>>> >>>>>> chris >>>>>> >>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I needed to manipulate Bio::Seq objects with annotations and sequence >>>>>>> features to simulate molecular cloning techniques, e.g. to cut a vector >>>>>>> and insert a fragment into it while preserving all the annotations and >>>>>>> moving the features accordingly. >>>>>>> My main aim was to split features that span deletion/insertion sites in >>>>>>> a meaningful way, which can not be done with the currently availble >>>>>>> methods. >>>>>>> I have modified Bio::SeqUtils so that I have the following new methods: >>>>>>> >>>>>>> delete >>>>>>> ====== >>>>>>> removes a segment from a sequence object and adjusts positions and types >>>>>>> of locations of sequence features: >>>>>>> - locations of features that span the deletion sites are turned into >>>>>>> Splits. >>>>>>> - locations that extend into the deleted region are turned to Fuzzy to >>>>>>> indicate that their true start/end was lost. >>>>>>> - locations contained inside the deleted regions are lost. >>>>>>> - other features are shifted according to the length of the deletion. >>>>>>> >>>>>>> insert >>>>>>> ====== >>>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>>> sites. This also affects the features on the recipient sequence: >>>>>>> - locations of features that span the insertion site are split but >>>>>>> position types are not turned to Fuzzy because no part of the original >>>>>>> feature is lost. >>>>>>> - other features are shifted according to the length of the insertion. >>>>>>> >>>>>>> ligate >>>>>>> ====== >>>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>>> Simply calls delete [, reverse_complement_with_features] and insert in >>>>>>> turn. >>>>>>> >>>>>>> >>>>>>> One situation I haven't handled yet is a deletion that spans the origin >>>>>>> of a circular molecule but that should be a rare thing to do anyway. The >>>>>>> code currently throws an error if this is attempted. >>>>>>> >>>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>> >>>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. Chris Fields Senior Research Scientist National Center for Supercomputing Applications Institute for Genomic Biology University of Illinois at Urbana-Champaign From nathan.watson-haigh at awri.com.au Wed Jan 11 01:42:30 2012 From: nathan.watson-haigh at awri.com.au (Nathan Watson-Haigh) Date: Wed, 11 Jan 2012 06:42:30 +0000 Subject: [Bioperl-l] Split ACE file by AGP scaffolds Message-ID: I have an ACE file (500k contigs > 500bp) generated by Newbler for a 450MB genome which I'd like to open in consed. However, due to its size and my computer memory limits it only opens 10-15%. I'm trying to spilt the ACE file into smaller subsets of contigs which i can handle in consed. I think a valid approach is to generate an ACE file per scaffold and work in consed on each scaffold in turn. Does this sound valid? If i take the AGP file that Newbler generated, i should be ankle to take the monolithic ACE file and split it into 20k ACE files representing each scaffold. Does anyone have thoughts on whether this is doable with the BioPerl with the Bio::Assembly:IO:ace module? If so, could you give me a couple of quick pointer? Cheers, Nath Sent from my Android phone. Nathan Watson-Haigh Senior Bioinformatician | The Australian Wine Research Institute Waite Precinct, Hartley Grove cnr Paratoo Road, Urrbrae (Adelaide) SA 5064 | Map PO Box 197, Glen Osmond SA 5064, Australia T: +61 8 83136836 (direct) | F: +61 8 83136601 | www: www.awri.com.au | AWRI Events This communication, including attachments, is intended only for the addressee(s) and contains information which might be confidential and/or the copyright of The Australian Wine Research Institute (AWRI) or a third party. If you are not the intended recipient of this communication please immediately delete and destroy all copies and contact the sender. If you are the intended recipient of this communication you should not copy, disclose or distribute any of the information contained herein without the consent of the AWRI and the sender. Any views expressed in this communication are those of the individual sender except where the sender specifically states them to be the views of the AWRI. No representation is made that this communication, including attachments, is free of viruses. Virus scanning is recommended and is the responsibility of the recipient. From paulospapathanasiou at hotmail.gr Wed Jan 11 08:04:20 2012 From: paulospapathanasiou at hotmail.gr (paulos) Date: Wed, 11 Jan 2012 05:04:20 -0800 (PST) Subject: [Bioperl-l] HOW TO PRINT SOME INFORMATION FROM A BLAST_OUTPUT.TXT FILE TO A NEW FASTA.TXT FILE Message-ID: <33120359.post@talk.nabble.com> GREETINGS TO ALL, THIS IS THE BLAST_OUTPUT .TXT FILE : http://old.nabble.com/file/p33120359/blast_output.txt blast_output.txt I WANT TO PRINT TO A NEW FILE NAMED AS FASTA.TXT THE PROTEINS' SEQUENCES MATCHED IN THE BLAST_OUTPUT FILE WITH SIMPLE PERL SCRIPT. THANKS IN ANDVANCED. -- View this message in context: http://old.nabble.com/HOW-TO-PRINT-SOME-INFORMATION-FROM-A-BLAST_OUTPUT.TXT-FILE-TO-A-NEW-FASTA.TXT-FILE-tp33120359p33120359.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From florian.lajus at inria.fr Wed Jan 11 07:43:04 2012 From: florian.lajus at inria.fr (lajus) Date: Wed, 11 Jan 2012 13:43:04 +0100 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: References: <4F0C7017.90803@inria.fr> Message-ID: <4F0D83D8.90402@inria.fr> I have looked to the Unflattener and the magic works quite fine. Then, the $seq which is given (by side-effect) by $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); has a good hierarchy for us. So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. But when I create a persitent object for $seq and if I create it: $adaptor->create_persistent($seq); $pseq->create(); In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. Do you have an explanation? Florian Le 10/01/2012 19:45, Fields, Christopher J a ?crit : > On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: > >> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>> Hello, >>> I am currently working on a refactoring of the Genolevures project >>> (http://www.genolevures.org/) >>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>> database. >>> >>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>> my database, my bioentry have been added and seqFeatures associated too. >>> But it seems that my seqfeature_relationship table is empty. >>> I find it strange in so far as there is a relationship between gene and its >>> CDS. right? >> No, not explicitly. Unlike GFF3 where there can be (and should be) >> explicit parent/child links between the gene and CDS, in GenBank >> and EMBL feature tables this is implicit only. I don't know if BioPerl >> attempts to infer this kind of relationship, and if it did, if that would >> get record in the BioSQL tables. >> >> Peter > BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: > > https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener > > chris > From florian.lajus at inria.fr Wed Jan 11 08:09:44 2012 From: florian.lajus at inria.fr (lajus) Date: Wed, 11 Jan 2012 14:09:44 +0100 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <4F0D83D8.90402@inria.fr> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> Message-ID: <4F0D8A18.1080606@inria.fr> Therefore, if I look in verbose mode, I can see that in the stack I have many : no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory Just warning, no errors but... Any clues? Thanks by advance, Florian Le 11/01/2012 13:43, lajus a ?crit : > I have looked to the Unflattener and the magic works quite fine. > Then, the $seq which is given (by side-effect) by > $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); > has a good hierarchy for us. > So I'm asking why can't I store this Bio::Seq in my database? Now > there is an explicit parent/child links between the gene and CDS. > But when I create a persitent object for $seq and if I create it: > $adaptor->create_persistent($seq); > $pseq->create(); > In my database, the bioentry and subseqFeatures are written but still > no relation in the seqFeature_relationship table. > > Do you have an explanation? > > Florian > > Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >> >>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>> Hello, >>>> I am currently working on a refactoring of the Genolevures project >>>> (http://www.genolevures.org/) >>>> We are trying to better use bioperl and the bioSQL shema on a >>>> postgreSQL >>>> database. >>>> >>>> I have loaded an EMBL file into my BioSQL database (postgres). If I >>>> look in >>>> my database, my bioentry have been added and seqFeatures associated >>>> too. >>>> But it seems that my seqfeature_relationship table is empty. >>>> I find it strange in so far as there is a relationship between gene >>>> and its >>>> CDS. right? >>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>> explicit parent/child links between the gene and CDS, in GenBank >>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>> attempts to infer this kind of relationship, and if it did, if that >>> would >>> get record in the BioSQL tables. >>> >>> Peter >> BioPerl does not attempt to infer these by default (too much magic, >> and too many potential issues), but one can use something like the >> Unflattener, which does have some magic built-in: >> >> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >> >> chris >> > From fs5 at sanger.ac.uk Wed Jan 11 09:50:01 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 11 Jan 2012 14:50:01 +0000 Subject: [Bioperl-l] HOW TO PRINT SOME INFORMATION FROM A BLAST_OUTPUT.TXT FILE TO A NEW FASTA.TXT FILE In-Reply-To: <33120359.post@talk.nabble.com> References: <33120359.post@talk.nabble.com> Message-ID: <4F0DA199.10202@sanger.ac.uk> Hi Paulos, have a look at the tutorial for using Bio::SearchIO: http://www.bioperl.org/wiki/HOWTO:SearchIO#Creating_Reports_for_SearchIO Under "Using SearchIO" you will see the generic script for getting information out of BLAST reports. You will see how to get each alignment shown in the report and access the information associated with it. Printing FASTA is very easy as you only need something like print ">$id\n$sequence\n"; but you can also look at the HOWTO for Bio::SeqIO (http://www.bioperl.org/wiki/HOWTO:SeqIO) to do it the BioPerl way. Frank On 11/01/12 13:04, paulos wrote: > GREETINGS TO ALL, > THIS IS THE BLAST_OUTPUT .TXT FILE : > http://old.nabble.com/file/p33120359/blast_output.txt blast_output.txt > I WANT TO PRINT TO A NEW FILE NAMED AS FASTA.TXT THE PROTEINS' SEQUENCES > MATCHED IN THE BLAST_OUTPUT FILE WITH SIMPLE PERL SCRIPT. > > THANKS IN ANDVANCED. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Wed Jan 11 10:44:06 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Jan 2012 15:44:06 +0000 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <4F0D8A18.1080606@inria.fr> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> Message-ID: Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? chris On Jan 11, 2012, at 7:09 AM, lajus wrote: > Therefore, if I look in verbose mode, I can see that in the stack I have many : > > no adaptor found for class Bio::Annotation::TypeManager > no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory > > Just warning, no errors but... > Any clues? > > Thanks by advance, > > Florian > > Le 11/01/2012 13:43, lajus a ?crit : >> I have looked to the Unflattener and the magic works quite fine. >> Then, the $seq which is given (by side-effect) by >> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >> has a good hierarchy for us. >> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >> But when I create a persitent object for $seq and if I create it: >> $adaptor->create_persistent($seq); >> $pseq->create(); >> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >> >> Do you have an explanation? >> >> Florian >> >> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>> >>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>> Hello, >>>>> I am currently working on a refactoring of the Genolevures project >>>>> (http://www.genolevures.org/) >>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>> database. >>>>> >>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>> But it seems that my seqfeature_relationship table is empty. >>>>> I find it strange in so far as there is a relationship between gene and its >>>>> CDS. right? >>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>> explicit parent/child links between the gene and CDS, in GenBank >>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>> attempts to infer this kind of relationship, and if it did, if that would >>>> get record in the BioSQL tables. >>>> >>>> Peter >>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>> >>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>> >>> chris >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florian.lajus at inria.fr Wed Jan 11 11:25:10 2012 From: florian.lajus at inria.fr (lajus) Date: Wed, 11 Jan 2012 17:25:10 +0100 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> Message-ID: <4F0DB7E6.5060306@inria.fr> I am using an EMBL file : That you can find here : http://www.ebi.ac.uk/Tools/dbfetch/dbfetch?db=embl&id=CR380947&style=raw I haven't already test with a GenBank file, but I have use Redseq to convert EMBL to GenBank and I got the file in attachment. Florian Le 11/01/2012 16:44, Fields, Christopher J a ?crit : > Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? > > chris > > On Jan 11, 2012, at 7:09 AM, lajus wrote: > >> Therefore, if I look in verbose mode, I can see that in the stack I have many : >> >> no adaptor found for class Bio::Annotation::TypeManager >> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory >> >> Just warning, no errors but... >> Any clues? >> >> Thanks by advance, >> >> Florian >> >> Le 11/01/2012 13:43, lajus a ?crit : >>> I have looked to the Unflattener and the magic works quite fine. >>> Then, the $seq which is given (by side-effect) by >>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>> has a good hierarchy for us. >>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >>> But when I create a persitent object for $seq and if I create it: >>> $adaptor->create_persistent($seq); >>> $pseq->create(); >>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >>> >>> Do you have an explanation? >>> >>> Florian >>> >>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>> >>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>>> Hello, >>>>>> I am currently working on a refactoring of the Genolevures project >>>>>> (http://www.genolevures.org/) >>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>>> database. >>>>>> >>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>> I find it strange in so far as there is a relationship between gene and its >>>>>> CDS. right? >>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>>> attempts to infer this kind of relationship, and if it did, if that would >>>>> get record in the BioSQL tables. >>>>> >>>>> Peter >>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>>> >>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>> >>>> chris >>>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test-seq.gb URL: From paulospapathanasiou at hotmail.gr Wed Jan 11 12:22:24 2012 From: paulospapathanasiou at hotmail.gr (paulos) Date: Wed, 11 Jan 2012 09:22:24 -0800 (PST) Subject: [Bioperl-l] HOW TO PRINT SOME INFORMATION FROM A BLAST_OUTPUT.TXT FILE TO A NEW FASTA.TXT FILE In-Reply-To: <4F0DA199.10202@sanger.ac.uk> References: <33120359.post@talk.nabble.com> <4F0DA199.10202@sanger.ac.uk> Message-ID: <33122670.post@talk.nabble.com> HI Frank Schwach, THANK U FOR HELPING ME I KNOW THAT BIOPERL IS VERY USEFUL FOR THE SOLUTION OF MY QUESTION AND THIS FORUM IS ABOUT ONLY BIOPERL'S QUERIES BUT I WOULD BE GRATEFUL IF I HAD A SCRIPT FOR PERL AND NOT BIOPERL. paulos -- View this message in context: http://old.nabble.com/HOW-TO-PRINT-SOME-INFORMATION-FROM-A-BLAST_OUTPUT.TXT-FILE-TO-A-NEW-FASTA.TXT-FILE-tp33120359p33122670.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From wkretzsch at gmail.com Wed Jan 11 12:44:35 2012 From: wkretzsch at gmail.com (Warren W. Kretzschmar) Date: Wed, 11 Jan 2012 17:44:35 +0000 Subject: [Bioperl-l] HOW TO PRINT SOME INFORMATION FROM A BLAST_OUTPUT.TXT FILE TO A NEW FASTA.TXT FILE In-Reply-To: <33122670.post@talk.nabble.com> References: <33120359.post@talk.nabble.com> <4F0DA199.10202@sanger.ac.uk> <33122670.post@talk.nabble.com> Message-ID: Hey Paulos, > BUT I WOULD BE GRATEFUL IF I HAD A SCRIPT FOR PERL AND NOT BIOPERL. This sounds a lot like a homework question. However, Frank has essentially given you the heart of the perl program already: print ">$id\n$sequence\n"; If you don't know what this means, then I suggest trying an introductory perl tutorial like this one: http://www.tizag.com/perlT/ Regards, Warren -- In God we trust, all others bring data. - William Edwards Deming On Wed, Jan 11, 2012 at 5:22 PM, paulos wrote: > > HI Frank Schwach, > > THANK U FOR HELPING ME > > I KNOW THAT BIOPERL IS VERY USEFUL FOR THE SOLUTION OF MY QUESTION > AND THIS FORUM IS ABOUT ONLY BIOPERL'S QUERIES > BUT I WOULD BE GRATEFUL IF I HAD A SCRIPT FOR PERL AND NOT BIOPERL. > > paulos > -- > View this message in context: > http://old.nabble.com/HOW-TO-PRINT-SOME-INFORMATION-FROM-A-BLAST_OUTPUT.TXT-FILE-TO-A-NEW-FASTA.TXT-FILE-tp33120359p33122670.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Wed Jan 11 13:16:53 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 11 Jan 2012 18:16:53 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F0C75DA.9050607@gmail.com> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> Message-ID: <4F0DD215.9070100@sanger.ac.uk> Hi Roy and Chris, I have made the changes to the code now. As you suggested, feature ends no longer change type and I insert a note instead to inform about the deletion (or insertion), showing the length and position. I have also added a feature to annotate deletion sites themselves (with IN-BETWEEN locations). Roy's test script now prints: LOCUS seq-accession_number 7 bp dna linear UNK ACCESSION unknown FEATURES Location/Qualifiers CDS join(2..3,4..6) /note="3bp internal deletion between pos 3 and 4" CDS 2..3 /note="2bp deleted from feature end" misc_feature 3^4 /note="deletion of 3bp" ORIGIN 1 aaaaaaa // or, if you add strand information (-1 in this case) to the second feature: LOCUS seq-accession_number 7 bp dna linear UNK ACCESSION unknown FEATURES Location/Qualifiers CDS join(2..3,4..6) /note="3bp internal deletion between pos 3 and 4" CDS complement(2..3) /note="2bp deleted from feature 5' end" misc_feature 3^4 /note="deletion of 3bp" ORIGIN 1 aaaaaaa // I have comitted this along with some bugfixes to my master branch on GitHub https://github.com/fschwach/bioperl-live so it's now also in my existing pull request. I'm still wondering if cloning the sequence objects rather than calling 'new' on their respective classes would be an option inside 'delete' and 'insert'? I'm experimenting with this for my own purposes because I have to work with custom sub-classes of Bio::Seq which have additional attributes and therefore set 'can_call_new' to false. Without cloning the objects, I first have to convert the custom Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. Is there any reason why something like Clone::Fast should not be used in this case? It seems to work for me but there may be situations where this is going to blow up which I am not aware of. Cloning rather than calling new could be made an option in Bio::SeqUtils. I have most of the code for that already. Frank On 10/01/12 17:31, Roy Chaudhuri wrote: > Or without the typo: > > CDS join(2..3,4..6) > /note="3 bp internal deletion" > CDS 2..3 > /note="2 bp deleted from 3' end" > > On 10/01/2012 17:27, Roy Chaudhuri wrote: >> I think it's me that didn't explain very well - I was talking about >> overlapping (rather than spanning) a deletion, although I think the same >> principle applies to the spanning example you gave. Here's some test >> code: >> >> #!/usr/bin/perl >> use warnings FATAL=>qw(all); >> use strict; >> use Bio::Seq; >> use Bio::SeqIO; >> use Bio::SeqUtils; >> use Bio::SeqFeature::Generic; >> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >> -start=>2, >> -end=>9)); >> >> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >> -start=>2, >> -end=>5)); >> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >> print $out $trunc; >> >> >> This currently outputs: >> LOCUS seq-accession_number 7 bp dna linear UNK >> ACCESSION unknown >> FEATURES Location/Qualifiers >> CDS join(2..>3,<4..6) >> CDS 2..>3 >> ORIGIN >> 1 aaaaaaa >> // >> >> However, I was suggesting that the feature table should be something >> like: >> CDS join(2..3,4..6) >> /note="3 bp internal deletion" >> CDS join(2..3) >> /note="2 bp deleted from 3' end" >> >> Fuzzy locations are intended to represent features which have boundaries >> spanning outside of the sequence. For a defined deletion that's not the >> case, the boundaries of the feature aren't unknown, they have been >> specifically altered. >> >> Hope this is clearer. >> Cheers, >> Roy. >> >> On 10/01/2012 16:47, Frank Schwach wrote: >>> Hi Roy, >>> >>> Sorry, I hadn't explained that very well: it's not the outer boundaries >>> of the feature that become fuzzy but the "inner" ones of the split >>> locations: >>> >>> -------------------- a feature's location >>> ==========xxxx================= sequence >>> >>> >>> --------- sublocation 1 >>> -------- sublocation 2 >>> =============================== >>> >>> x= sequence to delete >>> The feature's location has changed from Simple to Split. >>> >>> Sublocation 1: >>> start is still EXACT and has not changed >>> end is now AFTER because this is not a true end of the feature >>> >>> Sublocation 2: >>> start is BEFORE >>> end is EXACT (but shifted) >>> >>> I hope this makes more sense(?) >>> >>> Cheers, >>> >>> Frank >>> >>> >>> >>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>> Hi Frank, >>>> >>>> Looks good to me. One thing I'm not sure about - why do features >>>> overlapping a deletion become fuzzy? That behaviour is in >>>> trunc_with_features because it's intended to represent a taking a >>>> subregion of a larger sequence, but if you're representing an internal >>>> deletion then the boundaries of the overlapping feature aren't >>>> unknown, >>>> they have been specifically altered. Maybe you could give absolute >>>> coordinates, but add a note indicating that the 5' or 3' end has been >>>> truncated by however many bases. >>>> >>>> Cheers, >>>> Roy. >>>> >>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>> Hi Chris, >>>>> >>>>> I have made the changes in a Git fork and made the pull request now. >>>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>>> HOWTO for the BioPerl wiki. >>>>> >>>>> Frank >>>>> >>>>> >>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>> Sounds very promising! The easiest way to contribute is via a >>>>>> fork of the code on Github with a pull request (as you already >>>>>> know, being a contributor to the Primer3 modules). >>>>>> >>>>>> chris >>>>>> >>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>>> sequence >>>>>>> features to simulate molecular cloning techniques, e.g. to cut a >>>>>>> vector >>>>>>> and insert a fragment into it while preserving all the >>>>>>> annotations and >>>>>>> moving the features accordingly. >>>>>>> My main aim was to split features that span deletion/insertion >>>>>>> sites in >>>>>>> a meaningful way, which can not be done with the currently availble >>>>>>> methods. >>>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>>> methods: >>>>>>> >>>>>>> delete >>>>>>> ====== >>>>>>> removes a segment from a sequence object and adjusts positions >>>>>>> and types >>>>>>> of locations of sequence features: >>>>>>> - locations of features that span the deletion sites are turned >>>>>>> into >>>>>>> Splits. >>>>>>> - locations that extend into the deleted region are turned to >>>>>>> Fuzzy to >>>>>>> indicate that their true start/end was lost. >>>>>>> - locations contained inside the deleted regions are lost. >>>>>>> - other features are shifted according to the length of the >>>>>>> deletion. >>>>>>> >>>>>>> insert >>>>>>> ====== >>>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>>> sites. This also affects the features on the recipient sequence: >>>>>>> - locations of features that span the insertion site are split but >>>>>>> position types are not turned to Fuzzy because no part of the >>>>>>> original >>>>>>> feature is lost. >>>>>>> - other features are shifted according to the length of the >>>>>>> insertion. >>>>>>> >>>>>>> ligate >>>>>>> ====== >>>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>>> insert in >>>>>>> turn. >>>>>>> >>>>>>> >>>>>>> One situation I haven't handled yet is a deletion that spans the >>>>>>> origin >>>>>>> of a circular molecule but that should be a rare thing to do >>>>>>> anyway. The >>>>>>> code currently throws an error if this is attempted. >>>>>>> >>>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>> >>>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >> > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From roy.chaudhuri at gmail.com Wed Jan 11 13:38:34 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 11 Jan 2012 18:38:34 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F0DD215.9070100@sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> Message-ID: <4F0DD72A.90309@gmail.com> Hi Frank, Looks great, I like the use of between locations, didn't think of that. It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? Cheers, Roy. On 11/01/2012 18:16, Frank Schwach wrote: > Hi Roy and Chris, > > I have made the changes to the code now. As you suggested, feature ends > no longer change type and I insert a note instead to inform about the > deletion (or insertion), showing the length and position. > I have also added a feature to annotate deletion sites themselves (with > IN-BETWEEN locations). > > Roy's test script now prints: > > LOCUS seq-accession_number 7 bp dna linear UNK > ACCESSION unknown > FEATURES Location/Qualifiers > CDS join(2..3,4..6) > /note="3bp internal deletion between pos 3 and 4" > CDS 2..3 > /note="2bp deleted from feature end" > misc_feature 3^4 > /note="deletion of 3bp" > ORIGIN > 1 aaaaaaa > // > > > or, if you add strand information (-1 in this case) to the second feature: > > LOCUS seq-accession_number 7 bp dna linear UNK > ACCESSION unknown > FEATURES Location/Qualifiers > CDS join(2..3,4..6) > /note="3bp internal deletion between pos 3 and 4" > CDS complement(2..3) > /note="2bp deleted from feature 5' end" > misc_feature 3^4 > /note="deletion of 3bp" > ORIGIN > 1 aaaaaaa > // > > I have comitted this along with some bugfixes to my master branch on GitHub > https://github.com/fschwach/bioperl-live > so it's now also in my existing pull request. > > I'm still wondering if cloning the sequence objects rather than calling > 'new' on their respective classes would be an option inside 'delete' and > 'insert'? > I'm experimenting with this for my own purposes because I have to work > with custom sub-classes of Bio::Seq which have additional attributes and > therefore set 'can_call_new' to false. > Without cloning the objects, I first have to convert the custom > Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. > Is there any reason why something like Clone::Fast should not be used in > this case? It seems to work for me but there may be situations where > this is going to blow up which I am not aware of. > Cloning rather than calling new could be made an option in > Bio::SeqUtils. I have most of the code for that already. > > Frank > > > > > > > > > > > > On 10/01/12 17:31, Roy Chaudhuri wrote: >> Or without the typo: >> >> CDS join(2..3,4..6) >> /note="3 bp internal deletion" >> CDS 2..3 >> /note="2 bp deleted from 3' end" >> >> On 10/01/2012 17:27, Roy Chaudhuri wrote: >>> I think it's me that didn't explain very well - I was talking about >>> overlapping (rather than spanning) a deletion, although I think the same >>> principle applies to the spanning example you gave. Here's some test >>> code: >>> >>> #!/usr/bin/perl >>> use warnings FATAL=>qw(all); >>> use strict; >>> use Bio::Seq; >>> use Bio::SeqIO; >>> use Bio::SeqUtils; >>> use Bio::SeqFeature::Generic; >>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>> -start=>2, >>> -end=>9)); >>> >>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>> -start=>2, >>> -end=>5)); >>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >>> print $out $trunc; >>> >>> >>> This currently outputs: >>> LOCUS seq-accession_number 7 bp dna linear UNK >>> ACCESSION unknown >>> FEATURES Location/Qualifiers >>> CDS join(2..>3,<4..6) >>> CDS 2..>3 >>> ORIGIN >>> 1 aaaaaaa >>> // >>> >>> However, I was suggesting that the feature table should be something >>> like: >>> CDS join(2..3,4..6) >>> /note="3 bp internal deletion" >>> CDS join(2..3) >>> /note="2 bp deleted from 3' end" >>> >>> Fuzzy locations are intended to represent features which have boundaries >>> spanning outside of the sequence. For a defined deletion that's not the >>> case, the boundaries of the feature aren't unknown, they have been >>> specifically altered. >>> >>> Hope this is clearer. >>> Cheers, >>> Roy. >>> >>> On 10/01/2012 16:47, Frank Schwach wrote: >>>> Hi Roy, >>>> >>>> Sorry, I hadn't explained that very well: it's not the outer boundaries >>>> of the feature that become fuzzy but the "inner" ones of the split >>>> locations: >>>> >>>> -------------------- a feature's location >>>> ==========xxxx================= sequence >>>> >>>> >>>> --------- sublocation 1 >>>> -------- sublocation 2 >>>> =============================== >>>> >>>> x= sequence to delete >>>> The feature's location has changed from Simple to Split. >>>> >>>> Sublocation 1: >>>> start is still EXACT and has not changed >>>> end is now AFTER because this is not a true end of the feature >>>> >>>> Sublocation 2: >>>> start is BEFORE >>>> end is EXACT (but shifted) >>>> >>>> I hope this makes more sense(?) >>>> >>>> Cheers, >>>> >>>> Frank >>>> >>>> >>>> >>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>>> Hi Frank, >>>>> >>>>> Looks good to me. One thing I'm not sure about - why do features >>>>> overlapping a deletion become fuzzy? That behaviour is in >>>>> trunc_with_features because it's intended to represent a taking a >>>>> subregion of a larger sequence, but if you're representing an internal >>>>> deletion then the boundaries of the overlapping feature aren't >>>>> unknown, >>>>> they have been specifically altered. Maybe you could give absolute >>>>> coordinates, but add a note indicating that the 5' or 3' end has been >>>>> truncated by however many bases. >>>>> >>>>> Cheers, >>>>> Roy. >>>>> >>>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>>> Hi Chris, >>>>>> >>>>>> I have made the changes in a Git fork and made the pull request now. >>>>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>>>> HOWTO for the BioPerl wiki. >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>>> Sounds very promising! The easiest way to contribute is via a >>>>>>> fork of the code on Github with a pull request (as you already >>>>>>> know, being a contributor to the Primer3 modules). >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>>>> sequence >>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a >>>>>>>> vector >>>>>>>> and insert a fragment into it while preserving all the >>>>>>>> annotations and >>>>>>>> moving the features accordingly. >>>>>>>> My main aim was to split features that span deletion/insertion >>>>>>>> sites in >>>>>>>> a meaningful way, which can not be done with the currently availble >>>>>>>> methods. >>>>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>>>> methods: >>>>>>>> >>>>>>>> delete >>>>>>>> ====== >>>>>>>> removes a segment from a sequence object and adjusts positions >>>>>>>> and types >>>>>>>> of locations of sequence features: >>>>>>>> - locations of features that span the deletion sites are turned >>>>>>>> into >>>>>>>> Splits. >>>>>>>> - locations that extend into the deleted region are turned to >>>>>>>> Fuzzy to >>>>>>>> indicate that their true start/end was lost. >>>>>>>> - locations contained inside the deleted regions are lost. >>>>>>>> - other features are shifted according to the length of the >>>>>>>> deletion. >>>>>>>> >>>>>>>> insert >>>>>>>> ====== >>>>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>>>> sites. This also affects the features on the recipient sequence: >>>>>>>> - locations of features that span the insertion site are split but >>>>>>>> position types are not turned to Fuzzy because no part of the >>>>>>>> original >>>>>>>> feature is lost. >>>>>>>> - other features are shifted according to the length of the >>>>>>>> insertion. >>>>>>>> >>>>>>>> ligate >>>>>>>> ====== >>>>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>>>> insert in >>>>>>>> turn. >>>>>>>> >>>>>>>> >>>>>>>> One situation I haven't handled yet is a deletion that spans the >>>>>>>> origin >>>>>>>> of a circular molecule but that should be a rare thing to do >>>>>>>> anyway. The >>>>>>>> code currently throws an error if this is attempted. >>>>>>>> >>>>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>>> >>>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> >>> >> > > From cjfields at illinois.edu Wed Jan 11 13:42:46 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Jan 2012 18:42:46 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F0DD72A.90309@gmail.com> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> Message-ID: Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). chris On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: > Hi Frank, > > Looks great, I like the use of between locations, didn't think of that. > > It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? > > Cheers, > Roy. > > On 11/01/2012 18:16, Frank Schwach wrote: >> Hi Roy and Chris, >> >> I have made the changes to the code now. As you suggested, feature ends >> no longer change type and I insert a note instead to inform about the >> deletion (or insertion), showing the length and position. >> I have also added a feature to annotate deletion sites themselves (with >> IN-BETWEEN locations). >> >> Roy's test script now prints: >> >> LOCUS seq-accession_number 7 bp dna linear UNK >> ACCESSION unknown >> FEATURES Location/Qualifiers >> CDS join(2..3,4..6) >> /note="3bp internal deletion between pos 3 and 4" >> CDS 2..3 >> /note="2bp deleted from feature end" >> misc_feature 3^4 >> /note="deletion of 3bp" >> ORIGIN >> 1 aaaaaaa >> // >> >> >> or, if you add strand information (-1 in this case) to the second feature: >> >> LOCUS seq-accession_number 7 bp dna linear UNK >> ACCESSION unknown >> FEATURES Location/Qualifiers >> CDS join(2..3,4..6) >> /note="3bp internal deletion between pos 3 and 4" >> CDS complement(2..3) >> /note="2bp deleted from feature 5' end" >> misc_feature 3^4 >> /note="deletion of 3bp" >> ORIGIN >> 1 aaaaaaa >> // >> >> I have comitted this along with some bugfixes to my master branch on GitHub >> https://github.com/fschwach/bioperl-live >> so it's now also in my existing pull request. >> >> I'm still wondering if cloning the sequence objects rather than calling >> 'new' on their respective classes would be an option inside 'delete' and >> 'insert'? >> I'm experimenting with this for my own purposes because I have to work >> with custom sub-classes of Bio::Seq which have additional attributes and >> therefore set 'can_call_new' to false. >> Without cloning the objects, I first have to convert the custom >> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. >> Is there any reason why something like Clone::Fast should not be used in >> this case? It seems to work for me but there may be situations where >> this is going to blow up which I am not aware of. >> Cloning rather than calling new could be made an option in >> Bio::SeqUtils. I have most of the code for that already. >> >> Frank >> >> >> >> >> >> >> >> >> >> >> >> On 10/01/12 17:31, Roy Chaudhuri wrote: >>> Or without the typo: >>> >>> CDS join(2..3,4..6) >>> /note="3 bp internal deletion" >>> CDS 2..3 >>> /note="2 bp deleted from 3' end" >>> >>> On 10/01/2012 17:27, Roy Chaudhuri wrote: >>>> I think it's me that didn't explain very well - I was talking about >>>> overlapping (rather than spanning) a deletion, although I think the same >>>> principle applies to the spanning example you gave. Here's some test >>>> code: >>>> >>>> #!/usr/bin/perl >>>> use warnings FATAL=>qw(all); >>>> use strict; >>>> use Bio::Seq; >>>> use Bio::SeqIO; >>>> use Bio::SeqUtils; >>>> use Bio::SeqFeature::Generic; >>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>> -start=>2, >>>> -end=>9)); >>>> >>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>> -start=>2, >>>> -end=>5)); >>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >>>> print $out $trunc; >>>> >>>> >>>> This currently outputs: >>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>> ACCESSION unknown >>>> FEATURES Location/Qualifiers >>>> CDS join(2..>3,<4..6) >>>> CDS 2..>3 >>>> ORIGIN >>>> 1 aaaaaaa >>>> // >>>> >>>> However, I was suggesting that the feature table should be something >>>> like: >>>> CDS join(2..3,4..6) >>>> /note="3 bp internal deletion" >>>> CDS join(2..3) >>>> /note="2 bp deleted from 3' end" >>>> >>>> Fuzzy locations are intended to represent features which have boundaries >>>> spanning outside of the sequence. For a defined deletion that's not the >>>> case, the boundaries of the feature aren't unknown, they have been >>>> specifically altered. >>>> >>>> Hope this is clearer. >>>> Cheers, >>>> Roy. >>>> >>>> On 10/01/2012 16:47, Frank Schwach wrote: >>>>> Hi Roy, >>>>> >>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries >>>>> of the feature that become fuzzy but the "inner" ones of the split >>>>> locations: >>>>> >>>>> -------------------- a feature's location >>>>> ==========xxxx================= sequence >>>>> >>>>> >>>>> --------- sublocation 1 >>>>> -------- sublocation 2 >>>>> =============================== >>>>> >>>>> x= sequence to delete >>>>> The feature's location has changed from Simple to Split. >>>>> >>>>> Sublocation 1: >>>>> start is still EXACT and has not changed >>>>> end is now AFTER because this is not a true end of the feature >>>>> >>>>> Sublocation 2: >>>>> start is BEFORE >>>>> end is EXACT (but shifted) >>>>> >>>>> I hope this makes more sense(?) >>>>> >>>>> Cheers, >>>>> >>>>> Frank >>>>> >>>>> >>>>> >>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>>>> Hi Frank, >>>>>> >>>>>> Looks good to me. One thing I'm not sure about - why do features >>>>>> overlapping a deletion become fuzzy? That behaviour is in >>>>>> trunc_with_features because it's intended to represent a taking a >>>>>> subregion of a larger sequence, but if you're representing an internal >>>>>> deletion then the boundaries of the overlapping feature aren't >>>>>> unknown, >>>>>> they have been specifically altered. Maybe you could give absolute >>>>>> coordinates, but add a note indicating that the 5' or 3' end has been >>>>>> truncated by however many bases. >>>>>> >>>>>> Cheers, >>>>>> Roy. >>>>>> >>>>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>>>> Hi Chris, >>>>>>> >>>>>>> I have made the changes in a Git fork and made the pull request now. >>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>>>>> HOWTO for the BioPerl wiki. >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>>>> Sounds very promising! The easiest way to contribute is via a >>>>>>>> fork of the code on Github with a pull request (as you already >>>>>>>> know, being a contributor to the Primer3 modules). >>>>>>>> >>>>>>>> chris >>>>>>>> >>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>>>>> sequence >>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a >>>>>>>>> vector >>>>>>>>> and insert a fragment into it while preserving all the >>>>>>>>> annotations and >>>>>>>>> moving the features accordingly. >>>>>>>>> My main aim was to split features that span deletion/insertion >>>>>>>>> sites in >>>>>>>>> a meaningful way, which can not be done with the currently availble >>>>>>>>> methods. >>>>>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>>>>> methods: >>>>>>>>> >>>>>>>>> delete >>>>>>>>> ====== >>>>>>>>> removes a segment from a sequence object and adjusts positions >>>>>>>>> and types >>>>>>>>> of locations of sequence features: >>>>>>>>> - locations of features that span the deletion sites are turned >>>>>>>>> into >>>>>>>>> Splits. >>>>>>>>> - locations that extend into the deleted region are turned to >>>>>>>>> Fuzzy to >>>>>>>>> indicate that their true start/end was lost. >>>>>>>>> - locations contained inside the deleted regions are lost. >>>>>>>>> - other features are shifted according to the length of the >>>>>>>>> deletion. >>>>>>>>> >>>>>>>>> insert >>>>>>>>> ====== >>>>>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>>>>> sites. This also affects the features on the recipient sequence: >>>>>>>>> - locations of features that span the insertion site are split but >>>>>>>>> position types are not turned to Fuzzy because no part of the >>>>>>>>> original >>>>>>>>> feature is lost. >>>>>>>>> - other features are shifted according to the length of the >>>>>>>>> insertion. >>>>>>>>> >>>>>>>>> ligate >>>>>>>>> ====== >>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>>>>> insert in >>>>>>>>> turn. >>>>>>>>> >>>>>>>>> >>>>>>>>> One situation I haven't handled yet is a deletion that spans the >>>>>>>>> origin >>>>>>>>> of a circular molecule but that should be a rare thing to do >>>>>>>>> anyway. The >>>>>>>>> code currently throws an error if this is attempted. >>>>>>>>> >>>>>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>>>> >>>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >> >> > From fs5 at sanger.ac.uk Wed Jan 11 16:03:55 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 11 Jan 2012 21:03:55 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> Message-ID: <4F0DF93B.5020505@sanger.ac.uk> Great, I'll work on a branch that gives the user the option to use clone instead of new and then we can see if we want to use that in the end. In the meantime, what do you think about pulling this into bioperl-live? When I have some time again I can work on the HOWTO for these new features for the BioPerl wiki Frank On 11/01/12 18:42, Fields, Christopher J wrote: > Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). > > chris > > On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: > >> Hi Frank, >> >> Looks great, I like the use of between locations, didn't think of that. >> >> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? >> >> Cheers, >> Roy. >> >> On 11/01/2012 18:16, Frank Schwach wrote: >>> Hi Roy and Chris, >>> >>> I have made the changes to the code now. As you suggested, feature ends >>> no longer change type and I insert a note instead to inform about the >>> deletion (or insertion), showing the length and position. >>> I have also added a feature to annotate deletion sites themselves (with >>> IN-BETWEEN locations). >>> >>> Roy's test script now prints: >>> >>> LOCUS seq-accession_number 7 bp dna linear UNK >>> ACCESSION unknown >>> FEATURES Location/Qualifiers >>> CDS join(2..3,4..6) >>> /note="3bp internal deletion between pos 3 and 4" >>> CDS 2..3 >>> /note="2bp deleted from feature end" >>> misc_feature 3^4 >>> /note="deletion of 3bp" >>> ORIGIN >>> 1 aaaaaaa >>> // >>> >>> >>> or, if you add strand information (-1 in this case) to the second feature: >>> >>> LOCUS seq-accession_number 7 bp dna linear UNK >>> ACCESSION unknown >>> FEATURES Location/Qualifiers >>> CDS join(2..3,4..6) >>> /note="3bp internal deletion between pos 3 and 4" >>> CDS complement(2..3) >>> /note="2bp deleted from feature 5' end" >>> misc_feature 3^4 >>> /note="deletion of 3bp" >>> ORIGIN >>> 1 aaaaaaa >>> // >>> >>> I have comitted this along with some bugfixes to my master branch on GitHub >>> https://github.com/fschwach/bioperl-live >>> so it's now also in my existing pull request. >>> >>> I'm still wondering if cloning the sequence objects rather than calling >>> 'new' on their respective classes would be an option inside 'delete' and >>> 'insert'? >>> I'm experimenting with this for my own purposes because I have to work >>> with custom sub-classes of Bio::Seq which have additional attributes and >>> therefore set 'can_call_new' to false. >>> Without cloning the objects, I first have to convert the custom >>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. >>> Is there any reason why something like Clone::Fast should not be used in >>> this case? It seems to work for me but there may be situations where >>> this is going to blow up which I am not aware of. >>> Cloning rather than calling new could be made an option in >>> Bio::SeqUtils. I have most of the code for that already. >>> >>> Frank >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On 10/01/12 17:31, Roy Chaudhuri wrote: >>>> Or without the typo: >>>> >>>> CDS join(2..3,4..6) >>>> /note="3 bp internal deletion" >>>> CDS 2..3 >>>> /note="2 bp deleted from 3' end" >>>> >>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: >>>>> I think it's me that didn't explain very well - I was talking about >>>>> overlapping (rather than spanning) a deletion, although I think the same >>>>> principle applies to the spanning example you gave. Here's some test >>>>> code: >>>>> >>>>> #!/usr/bin/perl >>>>> use warnings FATAL=>qw(all); >>>>> use strict; >>>>> use Bio::Seq; >>>>> use Bio::SeqIO; >>>>> use Bio::SeqUtils; >>>>> use Bio::SeqFeature::Generic; >>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>> -start=>2, >>>>> -end=>9)); >>>>> >>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>> -start=>2, >>>>> -end=>5)); >>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >>>>> print $out $trunc; >>>>> >>>>> >>>>> This currently outputs: >>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>> ACCESSION unknown >>>>> FEATURES Location/Qualifiers >>>>> CDS join(2..>3,<4..6) >>>>> CDS 2..>3 >>>>> ORIGIN >>>>> 1 aaaaaaa >>>>> // >>>>> >>>>> However, I was suggesting that the feature table should be something >>>>> like: >>>>> CDS join(2..3,4..6) >>>>> /note="3 bp internal deletion" >>>>> CDS join(2..3) >>>>> /note="2 bp deleted from 3' end" >>>>> >>>>> Fuzzy locations are intended to represent features which have boundaries >>>>> spanning outside of the sequence. For a defined deletion that's not the >>>>> case, the boundaries of the feature aren't unknown, they have been >>>>> specifically altered. >>>>> >>>>> Hope this is clearer. >>>>> Cheers, >>>>> Roy. >>>>> >>>>> On 10/01/2012 16:47, Frank Schwach wrote: >>>>>> Hi Roy, >>>>>> >>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries >>>>>> of the feature that become fuzzy but the "inner" ones of the split >>>>>> locations: >>>>>> >>>>>> -------------------- a feature's location >>>>>> ==========xxxx================= sequence >>>>>> >>>>>> >>>>>> --------- sublocation 1 >>>>>> -------- sublocation 2 >>>>>> =============================== >>>>>> >>>>>> x= sequence to delete >>>>>> The feature's location has changed from Simple to Split. >>>>>> >>>>>> Sublocation 1: >>>>>> start is still EXACT and has not changed >>>>>> end is now AFTER because this is not a true end of the feature >>>>>> >>>>>> Sublocation 2: >>>>>> start is BEFORE >>>>>> end is EXACT (but shifted) >>>>>> >>>>>> I hope this makes more sense(?) >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>>>>> Hi Frank, >>>>>>> >>>>>>> Looks good to me. One thing I'm not sure about - why do features >>>>>>> overlapping a deletion become fuzzy? That behaviour is in >>>>>>> trunc_with_features because it's intended to represent a taking a >>>>>>> subregion of a larger sequence, but if you're representing an internal >>>>>>> deletion then the boundaries of the overlapping feature aren't >>>>>>> unknown, >>>>>>> they have been specifically altered. Maybe you could give absolute >>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been >>>>>>> truncated by however many bases. >>>>>>> >>>>>>> Cheers, >>>>>>> Roy. >>>>>>> >>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>>>>> Hi Chris, >>>>>>>> >>>>>>>> I have made the changes in a Git fork and made the pull request now. >>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>>>>>> HOWTO for the BioPerl wiki. >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>>>>> Sounds very promising! The easiest way to contribute is via a >>>>>>>>> fork of the code on Github with a pull request (as you already >>>>>>>>> know, being a contributor to the Primer3 modules). >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>>>>>> sequence >>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a >>>>>>>>>> vector >>>>>>>>>> and insert a fragment into it while preserving all the >>>>>>>>>> annotations and >>>>>>>>>> moving the features accordingly. >>>>>>>>>> My main aim was to split features that span deletion/insertion >>>>>>>>>> sites in >>>>>>>>>> a meaningful way, which can not be done with the currently availble >>>>>>>>>> methods. >>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>>>>>> methods: >>>>>>>>>> >>>>>>>>>> delete >>>>>>>>>> ====== >>>>>>>>>> removes a segment from a sequence object and adjusts positions >>>>>>>>>> and types >>>>>>>>>> of locations of sequence features: >>>>>>>>>> - locations of features that span the deletion sites are turned >>>>>>>>>> into >>>>>>>>>> Splits. >>>>>>>>>> - locations that extend into the deleted region are turned to >>>>>>>>>> Fuzzy to >>>>>>>>>> indicate that their true start/end was lost. >>>>>>>>>> - locations contained inside the deleted regions are lost. >>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>> deletion. >>>>>>>>>> >>>>>>>>>> insert >>>>>>>>>> ====== >>>>>>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>>>>>> sites. This also affects the features on the recipient sequence: >>>>>>>>>> - locations of features that span the insertion site are split but >>>>>>>>>> position types are not turned to Fuzzy because no part of the >>>>>>>>>> original >>>>>>>>>> feature is lost. >>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>> insertion. >>>>>>>>>> >>>>>>>>>> ligate >>>>>>>>>> ====== >>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>>>>>> insert in >>>>>>>>>> turn. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> One situation I haven't handled yet is a deletion that spans the >>>>>>>>>> origin >>>>>>>>>> of a circular molecule but that should be a rare thing to do >>>>>>>>>> anyway. The >>>>>>>>>> code currently throws an error if this is attempted. >>>>>>>>>> >>>>>>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>>>>> >>>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From paulospapathanasiou at hotmail.gr Wed Jan 11 17:10:17 2012 From: paulospapathanasiou at hotmail.gr (paulos) Date: Wed, 11 Jan 2012 14:10:17 -0800 (PST) Subject: [Bioperl-l] HOW TO PRINT SOME INFORMATION FROM A BLAST_OUTPUT.TXT FILE TO A NEW FASTA.TXT FILE In-Reply-To: References: <33120359.post@talk.nabble.com> <4F0DA199.10202@sanger.ac.uk> <33122670.post@talk.nabble.com> Message-ID: <33124531.post@talk.nabble.com> Hey Warren, u are right that it sounds like a homework stuff cause it is actually. I am a student, biologist one, and i have to do this. I am glad that u and Frank help me with the print statement but i'd like to have the whole script. I know that i am demanding. But i am really thankfull that u and Frank tried to help me so far. Also, i checked the link u gave me and didn't help me to write the desirable script... paulos -- View this message in context: http://old.nabble.com/HOW-TO-PRINT-SOME-INFORMATION-FROM-A-BLAST_OUTPUT.TXT-FILE-TO-A-NEW-FASTA.TXT-FILE-tp33120359p33124531.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Wed Jan 11 17:27:19 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 11 Jan 2012 22:27:19 +0000 Subject: [Bioperl-l] HOW TO PRINT SOME INFORMATION FROM A BLAST_OUTPUT.TXT FILE TO A NEW FASTA.TXT FILE In-Reply-To: <33124531.post@talk.nabble.com> References: <33120359.post@talk.nabble.com> <4F0DA199.10202@sanger.ac.uk> <33122670.post@talk.nabble.com> <33124531.post@talk.nabble.com> Message-ID: <2CBB8ABA-37BB-4D5C-927F-ED0F5201A601@illinois.edu> We can give you the script if you tell us your prof's name, so we can pass it on directly to him. Can't have you taking credit for everything (of course, as a bonus, we'll also send along links to this very public email thread :) chris On Jan 11, 2012, at 4:10 PM, paulos wrote: > > > Hey Warren, > > u are right that it sounds like a homework stuff cause it is actually. I am > a student, biologist one, and i have to do this. I am glad that u and Frank > help me with the print statement but i'd like to have the whole script. I > know that i am demanding. But i am really thankfull that u and Frank tried > to help me so far. Also, i checked the link u gave me and didn't help me to > write the desirable script... > > paulos > -- > View this message in context: http://old.nabble.com/HOW-TO-PRINT-SOME-INFORMATION-FROM-A-BLAST_OUTPUT.TXT-FILE-TO-A-NEW-FASTA.TXT-FILE-tp33120359p33124531.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maquino at knome.com Wed Jan 11 17:34:54 2012 From: maquino at knome.com (Mark Aquino) Date: Wed, 11 Jan 2012 17:34:54 -0500 Subject: [Bioperl-l] HOW TO PRINT SOME INFORMATION FROM A BLAST_OUTPUT.TXT FILE TO A NEW FASTA.TXT FILE In-Reply-To: <33124531.post@talk.nabble.com> References: <33120359.post@talk.nabble.com> <4F0DA199.10202@sanger.ac.uk> <33122670.post@talk.nabble.com> <33124531.post@talk.nabble.com> Message-ID: <05B20EDB-5EE0-4C9F-B82A-C48991A6F3C1@knome.com> Dear Paulos, The bioperl mailing list is not a homework or perl tutorial list it's purpose is for the development of and issues concerning the use of the bioperl module. If you need help with basic perl check www.perlmonks.org or better yet try to learn it on your own else you will end up unable to program anything on your own. Sincerely, Mark On Jan 11, 2012, at 5:10 PM, paulos wrote: > > > Hey Warren, > > u are right that it sounds like a homework stuff cause it is actually. I am > a student, biologist one, and i have to do this. I am glad that u and Frank > help me with the print statement but i'd like to have the whole script. I > know that i am demanding. But i am really thankfull that u and Frank tried > to help me so far. Also, i checked the link u gave me and didn't help me to > write the desirable script... > > paulos > -- > View this message in context: http://old.nabble.com/HOW-TO-PRINT-SOME-INFORMATION-FROM-A-BLAST_OUTPUT.TXT-FILE-TO-A-NEW-FASTA.TXT-FILE-tp33120359p33124531.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From florent.angly at gmail.com Wed Jan 11 17:40:12 2012 From: florent.angly at gmail.com (Florent Angly) Date: Thu, 12 Jan 2012 08:40:12 +1000 Subject: [Bioperl-l] Split ACE file by AGP scaffolds In-Reply-To: References: Message-ID: <4F0E0FCC.2030206@gmail.com> Hi Nathan, I am not familiar with AGP files, but I do not think that Bioperl can read them. However, to process the contigs in your ACE file, you can use Bio::Assembly::IO::ace : http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/Assembly/IO/ace.html The synopsis has snippets of code to get you started. Since you want to read and write contigs individually, you probably want to use these methods: next_contig, write_header, write_contig, write_footer What do you plan on doing with Consed? There are alternatives you could try to vizualize your assembly, e.g. Hawkeye, from the AMOS package. Best, Florent On 11/01/12 16:42, Nathan Watson-Haigh wrote: > I have an ACE file (500k contigs> 500bp) generated by Newbler for a 450MB genome which I'd like to open in consed. However, due to its size and my computer memory limits it only opens 10-15%. > > I'm trying to spilt the ACE file into smaller subsets of contigs which i can handle in consed. > > I think a valid approach is to generate an ACE file per scaffold and work in consed on each scaffold in turn. Does this sound valid? > > If i take the AGP file that Newbler generated, i should be ankle to take the monolithic ACE file and split it into 20k ACE files representing each scaffold. > > Does anyone have thoughts on whether this is doable with the BioPerl with the Bio::Assembly:IO:ace module? If so, could you give me a couple of quick pointer? > > Cheers, Nath > > Sent from my Android phone. > > > Nathan Watson-Haigh > Senior Bioinformatician | The Australian Wine Research Institute > Waite Precinct, Hartley Grove cnr Paratoo Road, Urrbrae (Adelaide) SA 5064 | Map > PO Box 197, Glen Osmond SA 5064, Australia > T: +61 8 83136836 (direct) | F: +61 8 83136601 | > www: www.awri.com.au | AWRI Events > > This communication, including attachments, is intended only for the addressee(s) and contains information which might be confidential and/or the copyright of The Australian Wine Research Institute (AWRI) or a third party. If you are not the intended recipient of this communication please immediately delete and destroy all copies and contact the sender. If you are the intended recipient of this communication you should not copy, disclose or distribute any of the information contained herein without the consent of the AWRI and the sender. Any views expressed in this communication are those of the individual sender except where the sender specifically states them to be the views of the AWRI. No representation is made that this communication, including attachments, is free of viruses. Virus scanning is recommended and is the responsibility of the recipient. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paulospapathanasiou at hotmail.gr Wed Jan 11 17:45:42 2012 From: paulospapathanasiou at hotmail.gr (paulos) Date: Wed, 11 Jan 2012 14:45:42 -0800 (PST) Subject: [Bioperl-l] HOW TO PRINT SOME INFORMATION FROM A BLAST_OUTPUT.TXT FILE TO A NEW FASTA.TXT FILE In-Reply-To: <2CBB8ABA-37BB-4D5C-927F-ED0F5201A601@illinois.edu> References: <33120359.post@talk.nabble.com> <4F0DA199.10202@sanger.ac.uk> <33122670.post@talk.nabble.com> <33124531.post@talk.nabble.com> <2CBB8ABA-37BB-4D5C-927F-ED0F5201A601@illinois.edu> Message-ID: <33124735.post@talk.nabble.com> Dear Chris, thanks for ur advice when u said i can't take credit for everything. I could say that i could be something better than a student and i don't feel sorry telling the truth. I asked some help (that i had it at least...not from u obviously...) and not irony. paulos -- View this message in context: http://old.nabble.com/HOW-TO-PRINT-SOME-INFORMATION-FROM-A-BLAST_OUTPUT.TXT-FILE-TO-A-NEW-FASTA.TXT-FILE-tp33120359p33124735.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From paulospapathanasiou at hotmail.gr Wed Jan 11 17:58:02 2012 From: paulospapathanasiou at hotmail.gr (paulos) Date: Wed, 11 Jan 2012 14:58:02 -0800 (PST) Subject: [Bioperl-l] HOW TO PRINT SOME INFORMATION FROM A BLAST_OUTPUT.TXT FILE TO A NEW FASTA.TXT FILE In-Reply-To: <05B20EDB-5EE0-4C9F-B82A-C48991A6F3C1@knome.com> References: <33120359.post@talk.nabble.com> <4F0DA199.10202@sanger.ac.uk> <33122670.post@talk.nabble.com> <33124531.post@talk.nabble.com> <05B20EDB-5EE0-4C9F-B82A-C48991A6F3C1@knome.com> Message-ID: <33124798.post@talk.nabble.com> Dear Mark, thank u for helping me and ur advices. I thought that it was also and forum about perl. I'll be more careful in the future.... Sincerely, paulos -- View this message in context: http://old.nabble.com/HOW-TO-PRINT-SOME-INFORMATION-FROM-A-BLAST_OUTPUT.TXT-FILE-TO-A-NEW-FASTA.TXT-FILE-tp33120359p33124798.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cjfields at illinois.edu Wed Jan 11 21:11:25 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 12 Jan 2012 02:11:25 +0000 Subject: [Bioperl-l] HOW TO PRINT SOME INFORMATION FROM A BLAST_OUTPUT.TXT FILE TO A NEW FASTA.TXT FILE In-Reply-To: <33124735.post@talk.nabble.com> References: <33120359.post@talk.nabble.com> <4F0DA199.10202@sanger.ac.uk> <33122670.post@talk.nabble.com> <33124531.post@talk.nabble.com> <2CBB8ABA-37BB-4D5C-927F-ED0F5201A601@illinois.edu> <33124735.post@talk.nabble.com> Message-ID: <5DB180EB-F6E1-4274-B015-9E3076DCC593@illinois.edu> Paulos, I think it's almost the perfect definition of irony if, for instance, one states they are honest while trying to have others do their homework and pass off other's work as their own. As for my (snarky) response, if you had done this on just about any other programming forum you would have had much worse. Please see this well-known FAQ: http://catb.org/~esr/faqs/smart-questions.html chris On Jan 11, 2012, at 4:45 PM, paulos wrote: > > Dear Chris, > > thanks for ur advice when u said i can't take credit for everything. I could > say that i could be something better than a student and i don't feel sorry > telling the truth. I asked some help (that i had it at least...not from u > obviously...) and not irony. > > paulos > > > > -- > View this message in context: http://old.nabble.com/HOW-TO-PRINT-SOME-INFORMATION-FROM-A-BLAST_OUTPUT.TXT-FILE-TO-A-NEW-FASTA.TXT-FILE-tp33120359p33124735.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Jan 12 08:39:02 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 12 Jan 2012 13:39:02 +0000 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <4F0EA69E.8040203@inria.fr> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> Message-ID: <01614F87-8CBE-4BDB-AD35-26ECFA748BD6@illinois.edu> You can work on that, sure. The other option (really more a workaround) is to store the generic (flattened) feature, then unflatten it on the fly after retrieving the data. chris On Jan 12, 2012, at 3:23 AM, lajus wrote: > Ok, I have looked in BioPerl code and it appears that subSeqFeature are not handled yet: > comment in SeqFeatureAdaptor.pm for store children function (and attach childrenn too): > "Bio::SeqFeatureI has a location, annotation, and possibly sub-seqfeatures as children. The latter is not implemented yet." > > So it's totally normal, if it doesn't work. > Have you started to implement this stuff, or should I rewrite another SeqFeatureAdaptor which handle this ? > > Florian > > Le 11/01/2012 16:44, Fields, Christopher J a ?crit : >> Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? >> >> chris >> >> On Jan 11, 2012, at 7:09 AM, lajus wrote: >> >>> Therefore, if I look in verbose mode, I can see that in the stack I have many : >>> >>> no adaptor found for class Bio::Annotation::TypeManager >>> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory >>> >>> Just warning, no errors but... >>> Any clues? >>> >>> Thanks by advance, >>> >>> Florian >>> >>> Le 11/01/2012 13:43, lajus a ?crit : >>>> I have looked to the Unflattener and the magic works quite fine. >>>> Then, the $seq which is given (by side-effect) by >>>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>>> has a good hierarchy for us. >>>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >>>> But when I create a persitent object for $seq and if I create it: >>>> $adaptor->create_persistent($seq); >>>> $pseq->create(); >>>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >>>> >>>> Do you have an explanation? >>>> >>>> Florian >>>> >>>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>>> >>>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>>>> Hello, >>>>>>> I am currently working on a refactoring of the Genolevures project >>>>>>> (http://www.genolevures.org/) >>>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>>>> database. >>>>>>> >>>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>>> I find it strange in so far as there is a relationship between gene and its >>>>>>> CDS. right? >>>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>>>> attempts to infer this kind of relationship, and if it did, if that would >>>>>> get record in the BioSQL tables. >>>>>> >>>>>> Peter >>>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>>>> >>>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>>> >>>>> chris >>>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From fs5 at sanger.ac.uk Thu Jan 12 09:13:17 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 12 Jan 2012 14:13:17 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F0DF93B.5020505@sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> Message-ID: <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> I have now created a version that gives the option to create the products of 'delete' and 'insert' via Bio::Root::Root:clone instead of calling 'new' on the input seq object class. Seems to be working fine for me so far. 'delete' and 'insert' can now take a hashref of options. The only option so far is to set 'clone_obj to true, to use cloning instead of creating objects via 'new'. Setting this parameter to false or not supplying the options hashref at all will give you the old behaviour (call 'new'). Example: my $product = Bio::SeqUtils->delete( $seq_obj, 11, 20, { clone_obj => 1} ); The ligate method takes clone_obj as a named parameter: my $new_molecule = Bio::Sequtils::Pbrtools->ligate( -recipient => $vector, -fragment => $fragment, -left => 1000, -right => 1100, -flip => 1, -clone_obj => 1 ); This is in a branch of my GitHub repo if you would like to have a look: https://github.com/fschwach/bioperl-live/tree/sequtils_clone Unfortunately, I can't add this option to trunc_with_features because the creation of the new object is delegated to 'trunc'. I guess I could implement 'trunc' in Bio::SeqUtils itself(?) What do you think, could this be merged into bioperl-live? Frank On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach wrote: > Great, I'll work on a branch that gives the user the option to use clone > instead of new and then we can see if we want to use that in the end. In > the meantime, what do you think about pulling this into bioperl-live? > When I have some time again I can work on the HOWTO for these new > features for the BioPerl wiki > > Frank > > > On 11/01/12 18:42, Fields, Christopher J wrote: > > Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). > > > > chris > > > > On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: > > > >> Hi Frank, > >> > >> Looks great, I like the use of between locations, didn't think of that. > >> > >> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? > >> > >> Cheers, > >> Roy. > >> > >> On 11/01/2012 18:16, Frank Schwach wrote: > >>> Hi Roy and Chris, > >>> > >>> I have made the changes to the code now. As you suggested, feature ends > >>> no longer change type and I insert a note instead to inform about the > >>> deletion (or insertion), showing the length and position. > >>> I have also added a feature to annotate deletion sites themselves (with > >>> IN-BETWEEN locations). > >>> > >>> Roy's test script now prints: > >>> > >>> LOCUS seq-accession_number 7 bp dna linear UNK > >>> ACCESSION unknown > >>> FEATURES Location/Qualifiers > >>> CDS join(2..3,4..6) > >>> /note="3bp internal deletion between pos 3 and 4" > >>> CDS 2..3 > >>> /note="2bp deleted from feature end" > >>> misc_feature 3^4 > >>> /note="deletion of 3bp" > >>> ORIGIN > >>> 1 aaaaaaa > >>> // > >>> > >>> > >>> or, if you add strand information (-1 in this case) to the second feature: > >>> > >>> LOCUS seq-accession_number 7 bp dna linear UNK > >>> ACCESSION unknown > >>> FEATURES Location/Qualifiers > >>> CDS join(2..3,4..6) > >>> /note="3bp internal deletion between pos 3 and 4" > >>> CDS complement(2..3) > >>> /note="2bp deleted from feature 5' end" > >>> misc_feature 3^4 > >>> /note="deletion of 3bp" > >>> ORIGIN > >>> 1 aaaaaaa > >>> // > >>> > >>> I have comitted this along with some bugfixes to my master branch on GitHub > >>> https://github.com/fschwach/bioperl-live > >>> so it's now also in my existing pull request. > >>> > >>> I'm still wondering if cloning the sequence objects rather than calling > >>> 'new' on their respective classes would be an option inside 'delete' and > >>> 'insert'? > >>> I'm experimenting with this for my own purposes because I have to work > >>> with custom sub-classes of Bio::Seq which have additional attributes and > >>> therefore set 'can_call_new' to false. > >>> Without cloning the objects, I first have to convert the custom > >>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. > >>> Is there any reason why something like Clone::Fast should not be used in > >>> this case? It seems to work for me but there may be situations where > >>> this is going to blow up which I am not aware of. > >>> Cloning rather than calling new could be made an option in > >>> Bio::SeqUtils. I have most of the code for that already. > >>> > >>> Frank > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On 10/01/12 17:31, Roy Chaudhuri wrote: > >>>> Or without the typo: > >>>> > >>>> CDS join(2..3,4..6) > >>>> /note="3 bp internal deletion" > >>>> CDS 2..3 > >>>> /note="2 bp deleted from 3' end" > >>>> > >>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: > >>>>> I think it's me that didn't explain very well - I was talking about > >>>>> overlapping (rather than spanning) a deletion, although I think the same > >>>>> principle applies to the spanning example you gave. Here's some test > >>>>> code: > >>>>> > >>>>> #!/usr/bin/perl > >>>>> use warnings FATAL=>qw(all); > >>>>> use strict; > >>>>> use Bio::Seq; > >>>>> use Bio::SeqIO; > >>>>> use Bio::SeqUtils; > >>>>> use Bio::SeqFeature::Generic; > >>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); > >>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > >>>>> -start=>2, > >>>>> -end=>9)); > >>>>> > >>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > >>>>> -start=>2, > >>>>> -end=>5)); > >>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); > >>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); > >>>>> print $out $trunc; > >>>>> > >>>>> > >>>>> This currently outputs: > >>>>> LOCUS seq-accession_number 7 bp dna linear UNK > >>>>> ACCESSION unknown > >>>>> FEATURES Location/Qualifiers > >>>>> CDS join(2..>3,<4..6) > >>>>> CDS 2..>3 > >>>>> ORIGIN > >>>>> 1 aaaaaaa > >>>>> // > >>>>> > >>>>> However, I was suggesting that the feature table should be something > >>>>> like: > >>>>> CDS join(2..3,4..6) > >>>>> /note="3 bp internal deletion" > >>>>> CDS join(2..3) > >>>>> /note="2 bp deleted from 3' end" > >>>>> > >>>>> Fuzzy locations are intended to represent features which have boundaries > >>>>> spanning outside of the sequence. For a defined deletion that's not the > >>>>> case, the boundaries of the feature aren't unknown, they have been > >>>>> specifically altered. > >>>>> > >>>>> Hope this is clearer. > >>>>> Cheers, > >>>>> Roy. > >>>>> > >>>>> On 10/01/2012 16:47, Frank Schwach wrote: > >>>>>> Hi Roy, > >>>>>> > >>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries > >>>>>> of the feature that become fuzzy but the "inner" ones of the split > >>>>>> locations: > >>>>>> > >>>>>> -------------------- a feature's location > >>>>>> ==========xxxx================= sequence > >>>>>> > >>>>>> > >>>>>> --------- sublocation 1 > >>>>>> -------- sublocation 2 > >>>>>> =============================== > >>>>>> > >>>>>> x= sequence to delete > >>>>>> The feature's location has changed from Simple to Split. > >>>>>> > >>>>>> Sublocation 1: > >>>>>> start is still EXACT and has not changed > >>>>>> end is now AFTER because this is not a true end of the feature > >>>>>> > >>>>>> Sublocation 2: > >>>>>> start is BEFORE > >>>>>> end is EXACT (but shifted) > >>>>>> > >>>>>> I hope this makes more sense(?) > >>>>>> > >>>>>> Cheers, > >>>>>> > >>>>>> Frank > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: > >>>>>>> Hi Frank, > >>>>>>> > >>>>>>> Looks good to me. One thing I'm not sure about - why do features > >>>>>>> overlapping a deletion become fuzzy? That behaviour is in > >>>>>>> trunc_with_features because it's intended to represent a taking a > >>>>>>> subregion of a larger sequence, but if you're representing an internal > >>>>>>> deletion then the boundaries of the overlapping feature aren't > >>>>>>> unknown, > >>>>>>> they have been specifically altered. Maybe you could give absolute > >>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been > >>>>>>> truncated by however many bases. > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Roy. > >>>>>>> > >>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: > >>>>>>>> Hi Chris, > >>>>>>>> > >>>>>>>> I have made the changes in a Git fork and made the pull request now. > >>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils > >>>>>>>> HOWTO for the BioPerl wiki. > >>>>>>>> > >>>>>>>> Frank > >>>>>>>> > >>>>>>>> > >>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: > >>>>>>>>> Sounds very promising! The easiest way to contribute is via a > >>>>>>>>> fork of the code on Github with a pull request (as you already > >>>>>>>>> know, being a contributor to the Primer3 modules). > >>>>>>>>> > >>>>>>>>> chris > >>>>>>>>> > >>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: > >>>>>>>>> > >>>>>>>>>> Hi all, > >>>>>>>>>> > >>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and > >>>>>>>>>> sequence > >>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a > >>>>>>>>>> vector > >>>>>>>>>> and insert a fragment into it while preserving all the > >>>>>>>>>> annotations and > >>>>>>>>>> moving the features accordingly. > >>>>>>>>>> My main aim was to split features that span deletion/insertion > >>>>>>>>>> sites in > >>>>>>>>>> a meaningful way, which can not be done with the currently availble > >>>>>>>>>> methods. > >>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new > >>>>>>>>>> methods: > >>>>>>>>>> > >>>>>>>>>> delete > >>>>>>>>>> ====== > >>>>>>>>>> removes a segment from a sequence object and adjusts positions > >>>>>>>>>> and types > >>>>>>>>>> of locations of sequence features: > >>>>>>>>>> - locations of features that span the deletion sites are turned > >>>>>>>>>> into > >>>>>>>>>> Splits. > >>>>>>>>>> - locations that extend into the deleted region are turned to > >>>>>>>>>> Fuzzy to > >>>>>>>>>> indicate that their true start/end was lost. > >>>>>>>>>> - locations contained inside the deleted regions are lost. > >>>>>>>>>> - other features are shifted according to the length of the > >>>>>>>>>> deletion. > >>>>>>>>>> > >>>>>>>>>> insert > >>>>>>>>>> ====== > >>>>>>>>>> adds a Bio::Seq object into another one between specified insertion > >>>>>>>>>> sites. This also affects the features on the recipient sequence: > >>>>>>>>>> - locations of features that span the insertion site are split but > >>>>>>>>>> position types are not turned to Fuzzy because no part of the > >>>>>>>>>> original > >>>>>>>>>> feature is lost. > >>>>>>>>>> - other features are shifted according to the length of the > >>>>>>>>>> insertion. > >>>>>>>>>> > >>>>>>>>>> ligate > >>>>>>>>>> ====== > >>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two > >>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. > >>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and > >>>>>>>>>> insert in > >>>>>>>>>> turn. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> One situation I haven't handled yet is a deletion that spans the > >>>>>>>>>> origin > >>>>>>>>>> of a circular molecule but that should be a rare thing to do > >>>>>>>>>> anyway. The > >>>>>>>>>> code currently throws an error if this is attempted. > >>>>>>>>>> > >>>>>>>>>> I'm happy to contribute the code on Github if there is interest? > >>>>>>>>>> Comments on the handling of feature locations highly welcome! > >>>>>>>>>> > >>>>>>>>>> Frank > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> > >>>>>> > >>> > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From florian.lajus at inria.fr Thu Jan 12 04:23:42 2012 From: florian.lajus at inria.fr (lajus) Date: Thu, 12 Jan 2012 10:23:42 +0100 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> Message-ID: <4F0EA69E.8040203@inria.fr> Ok, I have looked in BioPerl code and it appears that subSeqFeature are not handled yet: comment in SeqFeatureAdaptor.pm for store children function (and attach childrenn too): "Bio::SeqFeatureI has a location, annotation, and possibly sub-seqfeatures as children. The latter is not implemented yet." So it's totally normal, if it doesn't work. Have you started to implement this stuff, or should I rewrite another SeqFeatureAdaptor which handle this ? Florian Le 11/01/2012 16:44, Fields, Christopher J a ?crit : > Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? > > chris > > On Jan 11, 2012, at 7:09 AM, lajus wrote: > >> Therefore, if I look in verbose mode, I can see that in the stack I have many : >> >> no adaptor found for class Bio::Annotation::TypeManager >> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory >> >> Just warning, no errors but... >> Any clues? >> >> Thanks by advance, >> >> Florian >> >> Le 11/01/2012 13:43, lajus a ?crit : >>> I have looked to the Unflattener and the magic works quite fine. >>> Then, the $seq which is given (by side-effect) by >>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>> has a good hierarchy for us. >>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >>> But when I create a persitent object for $seq and if I create it: >>> $adaptor->create_persistent($seq); >>> $pseq->create(); >>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >>> >>> Do you have an explanation? >>> >>> Florian >>> >>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>> >>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>>> Hello, >>>>>> I am currently working on a refactoring of the Genolevures project >>>>>> (http://www.genolevures.org/) >>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>>> database. >>>>>> >>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>> I find it strange in so far as there is a relationship between gene and its >>>>>> CDS. right? >>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>>> attempts to infer this kind of relationship, and if it did, if that would >>>>> get record in the BioSQL tables. >>>>> >>>>> Peter >>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>>> >>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>> >>>> chris >>>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at drycafe.net Thu Jan 12 12:49:10 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Thu, 12 Jan 2012 12:49:10 -0500 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <4F0EA69E.8040203@inria.fr> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> Message-ID: <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> Hi Florian, Thanks for digging this up - this is what I had in memory, but I ran out of time last night in ascertaining that it is indeed still true. It'd be awesome if you can add the code to SeqFeatureAdaptor to also persist and retrieve sub-features. I think the object-relational mappings are all there already (in BaseDriver.pm). You could use the handling of bioentry-to-bioentry relationships (or term-to-term relationships) as a template for how to implement this. -hilmar On Jan 12, 2012, at 4:23 AM, lajus wrote: > Ok, I have looked in BioPerl code and it appears that subSeqFeature are not handled yet: > comment in SeqFeatureAdaptor.pm for store children function (and attach childrenn too): > "Bio::SeqFeatureI has a location, annotation, and possibly sub-seqfeatures as children. The latter is not implemented yet." > > So it's totally normal, if it doesn't work. > Have you started to implement this stuff, or should I rewrite another SeqFeatureAdaptor which handle this ? > > Florian > > Le 11/01/2012 16:44, Fields, Christopher J a ?crit : >> Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? >> >> chris >> >> On Jan 11, 2012, at 7:09 AM, lajus wrote: >> >>> Therefore, if I look in verbose mode, I can see that in the stack I have many : >>> >>> no adaptor found for class Bio::Annotation::TypeManager >>> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory >>> >>> Just warning, no errors but... >>> Any clues? >>> >>> Thanks by advance, >>> >>> Florian >>> >>> Le 11/01/2012 13:43, lajus a ?crit : >>>> I have looked to the Unflattener and the magic works quite fine. >>>> Then, the $seq which is given (by side-effect) by >>>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>>> has a good hierarchy for us. >>>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >>>> But when I create a persitent object for $seq and if I create it: >>>> $adaptor->create_persistent($seq); >>>> $pseq->create(); >>>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >>>> >>>> Do you have an explanation? >>>> >>>> Florian >>>> >>>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>>> >>>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>>>> Hello, >>>>>>> I am currently working on a refactoring of the Genolevures project >>>>>>> (http://www.genolevures.org/) >>>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>>>> database. >>>>>>> >>>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>>> I find it strange in so far as there is a relationship between gene and its >>>>>>> CDS. right? >>>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>>>> attempts to infer this kind of relationship, and if it did, if that would >>>>>> get record in the BioSQL tables. >>>>>> >>>>>> Peter >>>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>>>> >>>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>>> >>>>> chris >>>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From florian.lajus at inria.fr Fri Jan 13 04:25:23 2012 From: florian.lajus at inria.fr (lajus) Date: Fri, 13 Jan 2012 10:25:23 +0100 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> Message-ID: <4F0FF883.80109@inria.fr> Hi hilmar, Thanks for your hint, but I'm quite lost in the BioPerl architecture (and quite new in perl programming). I'd like to use the handling of term-to-term relationships as a template but I don't find what files are related to this. As far as I understand, I should create: - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB - a new object SeqFeatureRelationship (and its interface) in Bio/Seqfeature - modify SeqFeatureAdaptor to store children (just with a call to subSeqFeature in store_children sub and thanks to my SeqFeatureRelationshipAdaptor create new relationships) - modify SeqFeatureAdaptor to retrieve children ( thanks to my SeqFeatureRelationshipAdaptor create new relationships ) Is it the right way? Florian Le 12/01/2012 18:49, Hilmar Lapp a ?crit : > Hi Florian, > > Thanks for digging this up - this is what I had in memory, but I ran out of time last night in ascertaining that it is indeed still true. > > It'd be awesome if you can add the code to SeqFeatureAdaptor to also persist and retrieve sub-features. I think the object-relational mappings are all there already (in BaseDriver.pm). You could use the handling of bioentry-to-bioentry relationships (or term-to-term relationships) as a template for how to implement this. > > -hilmar > > On Jan 12, 2012, at 4:23 AM, lajus wrote: > >> Ok, I have looked in BioPerl code and it appears that subSeqFeature are not handled yet: >> comment in SeqFeatureAdaptor.pm for store children function (and attach childrenn too): >> "Bio::SeqFeatureI has a location, annotation, and possibly sub-seqfeatures as children. The latter is not implemented yet." >> >> So it's totally normal, if it doesn't work. >> Have you started to implement this stuff, or should I rewrite another SeqFeatureAdaptor which handle this ? >> >> Florian >> >> Le 11/01/2012 16:44, Fields, Christopher J a ?crit : >>> Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? >>> >>> chris >>> >>> On Jan 11, 2012, at 7:09 AM, lajus wrote: >>> >>>> Therefore, if I look in verbose mode, I can see that in the stack I have many : >>>> >>>> no adaptor found for class Bio::Annotation::TypeManager >>>> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory >>>> >>>> Just warning, no errors but... >>>> Any clues? >>>> >>>> Thanks by advance, >>>> >>>> Florian >>>> >>>> Le 11/01/2012 13:43, lajus a ?crit : >>>>> I have looked to the Unflattener and the magic works quite fine. >>>>> Then, the $seq which is given (by side-effect) by >>>>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>>>> has a good hierarchy for us. >>>>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >>>>> But when I create a persitent object for $seq and if I create it: >>>>> $adaptor->create_persistent($seq); >>>>> $pseq->create(); >>>>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >>>>> >>>>> Do you have an explanation? >>>>> >>>>> Florian >>>>> >>>>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>>>> >>>>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>>>>> Hello, >>>>>>>> I am currently working on a refactoring of the Genolevures project >>>>>>>> (http://www.genolevures.org/) >>>>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>>>>> database. >>>>>>>> >>>>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>>>> I find it strange in so far as there is a relationship between gene and its >>>>>>>> CDS. right? >>>>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>>>>> attempts to infer this kind of relationship, and if it did, if that would >>>>>>> get record in the BioSQL tables. >>>>>>> >>>>>>> Peter >>>>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>>>>> >>>>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>>>> >>>>>> chris >>>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From florian.lajus at inria.fr Fri Jan 13 04:27:27 2012 From: florian.lajus at inria.fr (lajus) Date: Fri, 13 Jan 2012 10:27:27 +0100 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <4F0FF883.80109@inria.fr> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> Message-ID: <4F0FF8FF.6070706@inria.fr> I should write - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB/BioSQL of course Le 13/01/2012 10:25, lajus a ?crit : > Hi hilmar, > > Thanks for your hint, but I'm quite lost in the BioPerl architecture > (and quite new in perl programming). I'd like to use the handling of > term-to-term relationships as a template but I don't find what files > are related to this. > > As far as I understand, I should create: > - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB > - a new object SeqFeatureRelationship (and its interface) in > Bio/Seqfeature > - modify SeqFeatureAdaptor to store children (just with a call to > subSeqFeature in store_children sub and thanks to my > SeqFeatureRelationshipAdaptor create new relationships) > - modify SeqFeatureAdaptor to retrieve children ( thanks to my > SeqFeatureRelationshipAdaptor create new relationships ) > > Is it the right way? > > Florian > > Le 12/01/2012 18:49, Hilmar Lapp a ?crit : >> Hi Florian, >> >> Thanks for digging this up - this is what I had in memory, but I ran >> out of time last night in ascertaining that it is indeed still true. >> >> It'd be awesome if you can add the code to SeqFeatureAdaptor to also >> persist and retrieve sub-features. I think the object-relational >> mappings are all there already (in BaseDriver.pm). You could use the >> handling of bioentry-to-bioentry relationships (or term-to-term >> relationships) as a template for how to implement this. >> >> -hilmar >> >> On Jan 12, 2012, at 4:23 AM, lajus wrote: >> >>> Ok, I have looked in BioPerl code and it appears that subSeqFeature >>> are not handled yet: >>> comment in SeqFeatureAdaptor.pm for store children function (and >>> attach childrenn too): >>> "Bio::SeqFeatureI has a location, annotation, and possibly >>> sub-seqfeatures as children. The latter is not implemented yet." >>> >>> So it's totally normal, if it doesn't work. >>> Have you started to implement this stuff, or should I rewrite >>> another SeqFeatureAdaptor which handle this ? >>> >>> Florian >>> >>> Le 11/01/2012 16:44, Fields, Christopher J a ?crit : >>>> Seems like a possible bug with bioperl-db, I believe hierarchal >>>> seqfeatures are stored, but it's worth looking into. Do you have >>>> some example data (genbank file you are using, for instance)? >>>> >>>> chris >>>> >>>> On Jan 11, 2012, at 7:09 AM, lajus wrote: >>>> >>>>> Therefore, if I look in verbose mode, I can see that in the stack >>>>> I have many : >>>>> >>>>> no adaptor found for class Bio::Annotation::TypeManager >>>>> no adaptor found for class >>>>> Bio::DB::Persistent::PersistentObjectFactory >>>>> >>>>> Just warning, no errors but... >>>>> Any clues? >>>>> >>>>> Thanks by advance, >>>>> >>>>> Florian >>>>> >>>>> Le 11/01/2012 13:43, lajus a ?crit : >>>>>> I have looked to the Unflattener and the magic works quite fine. >>>>>> Then, the $seq which is given (by side-effect) by >>>>>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>>>>> has a good hierarchy for us. >>>>>> So I'm asking why can't I store this Bio::Seq in my database? Now >>>>>> there is an explicit parent/child links between the gene and CDS. >>>>>> But when I create a persitent object for $seq and if I create it: >>>>>> $adaptor->create_persistent($seq); >>>>>> $pseq->create(); >>>>>> In my database, the bioentry and subseqFeatures are written but >>>>>> still no relation in the seqFeature_relationship table. >>>>>> >>>>>> Do you have an explanation? >>>>>> >>>>>> Florian >>>>>> >>>>>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>>>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>>>>> >>>>>>>> On Tue, Jan 10, 2012 at 5:06 PM, >>>>>>>> lajus wrote: >>>>>>>>> Hello, >>>>>>>>> I am currently working on a refactoring of the Genolevures >>>>>>>>> project >>>>>>>>> (http://www.genolevures.org/) >>>>>>>>> We are trying to better use bioperl and the bioSQL shema on a >>>>>>>>> postgreSQL >>>>>>>>> database. >>>>>>>>> >>>>>>>>> I have loaded an EMBL file into my BioSQL database (postgres). >>>>>>>>> If I look in >>>>>>>>> my database, my bioentry have been added and seqFeatures >>>>>>>>> associated too. >>>>>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>>>>> I find it strange in so far as there is a relationship between >>>>>>>>> gene and its >>>>>>>>> CDS. right? >>>>>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>>>>> and EMBL feature tables this is implicit only. I don't know if >>>>>>>> BioPerl >>>>>>>> attempts to infer this kind of relationship, and if it did, if >>>>>>>> that would >>>>>>>> get record in the BioSQL tables. >>>>>>>> >>>>>>>> Peter >>>>>>> BioPerl does not attempt to infer these by default (too much >>>>>>> magic, and too many potential issues), but one can use something >>>>>>> like the Unflattener, which does have some magic built-in: >>>>>>> >>>>>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>>>>> >>>>>>> chris >>>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Fri Jan 13 11:18:22 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 13 Jan 2012 16:18:22 +0000 Subject: [Bioperl-l] Bio-EUtilities released Message-ID: <71E7C798-F2FF-44F5-BB1B-C0539FF361EB@illinois.edu> All, I have released the (now separate) Bio-EUtilities on CPAN. For this release I switched to using an independent version number (1.71) that is simpler and evaluates as newer that the current bioperl version (1.006902). http://search.cpan.org/dist/Bio-EUtilities/ This release doesn't seem to interfere with the current BioPerl CPAN releases: one should be able to install this, which will then install all prerequisites (including BioPerl and older EUtilities code). The code can be retrieved here: chris PS : this bodes well for splitting off code from bioperl-live into their own repositories. We have a few distributions that are already on this path (Bio-FeatureIO, Bio-Root, etc.). From pawan.mani2 at gmail.com Fri Jan 13 11:46:55 2012 From: pawan.mani2 at gmail.com (kakchingtabam pawankumar sharma) Date: Fri, 13 Jan 2012 22:16:55 +0530 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. Message-ID: Hi, ? ? ? ? ? ? ?Using?Bio::SearchIO module I am parsing the following Blast result. I have used the option- $hsp->strand('query'). But I cannot get detail of alignment. I need to know if my hit is forward (Strand = Plus / Plus) or reverse ( Strand = Plus / Minus)... Can anyone help me to get report as Plus or Minus for query or hit. thanks in advanced. With regards, Pawan BLASTN 2.2.18 [Dec-23-2011] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", ?Nucleic Acids Res. 25:3389-3402. Query= 000013_c10079-9984 ? ? ? ? ?(50 letters) Database: Cyano_Probe.fasta ? ? ? ? ? ?4760 sequences; 238,000 total letters Searching..................................................done ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Score ? ?E Sequences producing significant alignments: ? ? ? ? ? ? ? ? ? ? ?(bits) Value 000013_c10079-9984 100 ? 7e-024 002619_2689273-2690037 24 ? 0.36 001126_c1123720-1123385 24 ? 0.36 003211_c3326737-3326480 22 ? 1.4 002415_2471082-2471420 22 ? 1.4 002269_2321276-2322463 22 ? 1.4 001328_c1326535-1326164 22 ? 1.4 >000013_c10079-9984 ? ? ? ? ? Length = 50 ?Score = 99.6 bits (50), Expect = 7e-024 ?Identities = 50/50 (100%) ?Strand = Plus / Plus Query: 1 ?agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 ? ? ? ? ? |||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 1 ?agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 From David.Messina at sbc.su.se Fri Jan 13 11:44:06 2012 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 13 Jan 2012 17:44:06 +0100 Subject: [Bioperl-l] Bio-EUtilities released In-Reply-To: <71E7C798-F2FF-44F5-BB1B-C0539FF361EB@illinois.edu> References: <71E7C798-F2FF-44F5-BB1B-C0539FF361EB@illinois.edu> Message-ID: Wonderful, Chris, and congratulations! Looking forward to taking it out for a spin... Best, Dave From fs5 at sanger.ac.uk Fri Jan 13 16:43:58 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Fri, 13 Jan 2012 21:43:58 +0000 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: References: Message-ID: <4F10A59E.5040807@sanger.ac.uk> Hi Pawan, Can you show your code? Is it basically following the structure shown in http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO ? If that is the case $hsp->strand('query') is exactly what you need. To check if hit and query are on different strands you can do: if ( $hsp->strand('query') * $hsp->strand('hit') == -1){ # do whatever you need to do if they are on opposite strands } Hope that helps Frank On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: > Hi, > Using Bio::SearchIO module I am parsing the following Blast result. > I have used the option- $hsp->strand('query'). > > But I cannot get detail of alignment. > > I need to know if my hit is forward (Strand = Plus / Plus) > or reverse ( Strand = Plus / Minus)... > Can anyone help me to get report as Plus or Minus for query or hit. > > thanks in advanced. > > With regards, > Pawan > > > > BLASTN 2.2.18 [Dec-23-2011] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= 000013_c10079-9984 > (50 letters) > > Database: Cyano_Probe.fasta > 4760 sequences; 238,000 total letters > > Searching..................................................done > > > > Score E > Sequences producing significant alignments: (bits) Value > > 000013_c10079-9984 > 100 7e-024 > 002619_2689273-2690037 > 24 0.36 > 001126_c1123720-1123385 > 24 0.36 > 003211_c3326737-3326480 > 22 1.4 > 002415_2471082-2471420 > 22 1.4 > 002269_2321276-2322463 > 22 1.4 > 001328_c1326535-1326164 > 22 1.4 > >> 000013_c10079-9984 > Length = 50 > > Score = 99.6 bits (50), Expect = 7e-024 > Identities = 50/50 (100%) > Strand = Plus / Plus > > > Query: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 > |||||||||||||||||||||||||||||||||||||||||||||||||| > Sbjct: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From fs5 at sanger.ac.uk Fri Jan 13 16:47:50 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Fri, 13 Jan 2012 21:47:50 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F0DF93B.5020505@sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> Message-ID: <4F10A686.9010905@sanger.ac.uk> Hi Chris, Thanks for merging my pull request. I have a version now tha allows using clone as an option. Would that be of interest too? Frank On 11/01/12 21:03, Frank Schwach wrote: > Great, I'll work on a branch that gives the user the option to use > clone instead of new and then we can see if we want to use that in the > end. In the meantime, what do you think about pulling this into > bioperl-live? When I have some time again I can work on the HOWTO for > these new features for the BioPerl wiki > > Frank > > > On 11/01/12 18:42, Fields, Christopher J wrote: >> Note that Bio::Root::Root now has a clone() method that one can take >> advantage of for this purpose; if Storable or Clone is available, it >> will pick one of the two, preferably Clone over Storable. It's >> fairly untested, but we haven't run into problems with it yet (I >> think it was in the last CPAN release). >> >> chris >> >> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: >> >>> Hi Frank, >>> >>> Looks great, I like the use of between locations, didn't think of that. >>> >>> It was suggested that I avoid using Clone for cat, >>> trunc_with_features etc. to avoid adding a dependency (which may no >>> longer be an issue) and because it would cause problems for Bio::Seq >>> implementations that use a database as the back-end. Maybe you could >>> add it as an option, but keep the default as is? >>> >>> Cheers, >>> Roy. >>> >>> On 11/01/2012 18:16, Frank Schwach wrote: >>>> Hi Roy and Chris, >>>> >>>> I have made the changes to the code now. As you suggested, feature >>>> ends >>>> no longer change type and I insert a note instead to inform about the >>>> deletion (or insertion), showing the length and position. >>>> I have also added a feature to annotate deletion sites themselves >>>> (with >>>> IN-BETWEEN locations). >>>> >>>> Roy's test script now prints: >>>> >>>> LOCUS seq-accession_number 7 bp dna >>>> linear UNK >>>> ACCESSION unknown >>>> FEATURES Location/Qualifiers >>>> CDS join(2..3,4..6) >>>> /note="3bp internal deletion between pos 3 >>>> and 4" >>>> CDS 2..3 >>>> /note="2bp deleted from feature end" >>>> misc_feature 3^4 >>>> /note="deletion of 3bp" >>>> ORIGIN >>>> 1 aaaaaaa >>>> // >>>> >>>> >>>> or, if you add strand information (-1 in this case) to the second >>>> feature: >>>> >>>> LOCUS seq-accession_number 7 bp dna >>>> linear UNK >>>> ACCESSION unknown >>>> FEATURES Location/Qualifiers >>>> CDS join(2..3,4..6) >>>> /note="3bp internal deletion between pos 3 >>>> and 4" >>>> CDS complement(2..3) >>>> /note="2bp deleted from feature 5' end" >>>> misc_feature 3^4 >>>> /note="deletion of 3bp" >>>> ORIGIN >>>> 1 aaaaaaa >>>> // >>>> >>>> I have comitted this along with some bugfixes to my master branch >>>> on GitHub >>>> https://github.com/fschwach/bioperl-live >>>> so it's now also in my existing pull request. >>>> >>>> I'm still wondering if cloning the sequence objects rather than >>>> calling >>>> 'new' on their respective classes would be an option inside >>>> 'delete' and >>>> 'insert'? >>>> I'm experimenting with this for my own purposes because I have to work >>>> with custom sub-classes of Bio::Seq which have additional >>>> attributes and >>>> therefore set 'can_call_new' to false. >>>> Without cloning the objects, I first have to convert the custom >>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to >>>> avoid. >>>> Is there any reason why something like Clone::Fast should not be >>>> used in >>>> this case? It seems to work for me but there may be situations where >>>> this is going to blow up which I am not aware of. >>>> Cloning rather than calling new could be made an option in >>>> Bio::SeqUtils. I have most of the code for that already. >>>> >>>> Frank >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 10/01/12 17:31, Roy Chaudhuri wrote: >>>>> Or without the typo: >>>>> >>>>> CDS join(2..3,4..6) >>>>> /note="3 bp internal deletion" >>>>> CDS 2..3 >>>>> /note="2 bp deleted from 3' end" >>>>> >>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: >>>>>> I think it's me that didn't explain very well - I was talking about >>>>>> overlapping (rather than spanning) a deletion, although I think >>>>>> the same >>>>>> principle applies to the spanning example you gave. Here's some test >>>>>> code: >>>>>> >>>>>> #!/usr/bin/perl >>>>>> use warnings FATAL=>qw(all); >>>>>> use strict; >>>>>> use Bio::Seq; >>>>>> use Bio::SeqIO; >>>>>> use Bio::SeqUtils; >>>>>> use Bio::SeqFeature::Generic; >>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>> >>>>>> -start=>2, >>>>>> -end=>9)); >>>>>> >>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>> >>>>>> -start=>2, >>>>>> -end=>5)); >>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >>>>>> print $out $trunc; >>>>>> >>>>>> >>>>>> This currently outputs: >>>>>> LOCUS seq-accession_number 7 bp dna >>>>>> linear UNK >>>>>> ACCESSION unknown >>>>>> FEATURES Location/Qualifiers >>>>>> CDS join(2..>3,<4..6) >>>>>> CDS 2..>3 >>>>>> ORIGIN >>>>>> 1 aaaaaaa >>>>>> // >>>>>> >>>>>> However, I was suggesting that the feature table should be something >>>>>> like: >>>>>> CDS join(2..3,4..6) >>>>>> /note="3 bp internal deletion" >>>>>> CDS join(2..3) >>>>>> /note="2 bp deleted from 3' end" >>>>>> >>>>>> Fuzzy locations are intended to represent features which have >>>>>> boundaries >>>>>> spanning outside of the sequence. For a defined deletion that's >>>>>> not the >>>>>> case, the boundaries of the feature aren't unknown, they have been >>>>>> specifically altered. >>>>>> >>>>>> Hope this is clearer. >>>>>> Cheers, >>>>>> Roy. >>>>>> >>>>>> On 10/01/2012 16:47, Frank Schwach wrote: >>>>>>> Hi Roy, >>>>>>> >>>>>>> Sorry, I hadn't explained that very well: it's not the outer >>>>>>> boundaries >>>>>>> of the feature that become fuzzy but the "inner" ones of the split >>>>>>> locations: >>>>>>> >>>>>>> -------------------- a feature's location >>>>>>> ==========xxxx================= sequence >>>>>>> >>>>>>> >>>>>>> --------- sublocation 1 >>>>>>> -------- sublocation 2 >>>>>>> =============================== >>>>>>> >>>>>>> x= sequence to delete >>>>>>> The feature's location has changed from Simple to Split. >>>>>>> >>>>>>> Sublocation 1: >>>>>>> start is still EXACT and has not changed >>>>>>> end is now AFTER because this is not a true end of the feature >>>>>>> >>>>>>> Sublocation 2: >>>>>>> start is BEFORE >>>>>>> end is EXACT (but shifted) >>>>>>> >>>>>>> I hope this makes more sense(?) >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>>>>>> Hi Frank, >>>>>>>> >>>>>>>> Looks good to me. One thing I'm not sure about - why do features >>>>>>>> overlapping a deletion become fuzzy? That behaviour is in >>>>>>>> trunc_with_features because it's intended to represent a taking a >>>>>>>> subregion of a larger sequence, but if you're representing an >>>>>>>> internal >>>>>>>> deletion then the boundaries of the overlapping feature aren't >>>>>>>> unknown, >>>>>>>> they have been specifically altered. Maybe you could give absolute >>>>>>>> coordinates, but add a note indicating that the 5' or 3' end >>>>>>>> has been >>>>>>>> truncated by however many bases. >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Roy. >>>>>>>> >>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>>>>>> Hi Chris, >>>>>>>>> >>>>>>>>> I have made the changes in a Git fork and made the pull >>>>>>>>> request now. >>>>>>>>> If this is accepted into BioPerl I can also write a little >>>>>>>>> SeqUtils >>>>>>>>> HOWTO for the BioPerl wiki. >>>>>>>>> >>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>>>>>> Sounds very promising! The easiest way to contribute is via a >>>>>>>>>> fork of the code on Github with a pull request (as you already >>>>>>>>>> know, being a contributor to the Primer3 modules). >>>>>>>>>> >>>>>>>>>> chris >>>>>>>>>> >>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>>>>>>> sequence >>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to >>>>>>>>>>> cut a >>>>>>>>>>> vector >>>>>>>>>>> and insert a fragment into it while preserving all the >>>>>>>>>>> annotations and >>>>>>>>>>> moving the features accordingly. >>>>>>>>>>> My main aim was to split features that span deletion/insertion >>>>>>>>>>> sites in >>>>>>>>>>> a meaningful way, which can not be done with the currently >>>>>>>>>>> availble >>>>>>>>>>> methods. >>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>>>>>>> methods: >>>>>>>>>>> >>>>>>>>>>> delete >>>>>>>>>>> ====== >>>>>>>>>>> removes a segment from a sequence object and adjusts positions >>>>>>>>>>> and types >>>>>>>>>>> of locations of sequence features: >>>>>>>>>>> - locations of features that span the deletion sites are turned >>>>>>>>>>> into >>>>>>>>>>> Splits. >>>>>>>>>>> - locations that extend into the deleted region are turned to >>>>>>>>>>> Fuzzy to >>>>>>>>>>> indicate that their true start/end was lost. >>>>>>>>>>> - locations contained inside the deleted regions are lost. >>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>> deletion. >>>>>>>>>>> >>>>>>>>>>> insert >>>>>>>>>>> ====== >>>>>>>>>>> adds a Bio::Seq object into another one between specified >>>>>>>>>>> insertion >>>>>>>>>>> sites. This also affects the features on the recipient >>>>>>>>>>> sequence: >>>>>>>>>>> - locations of features that span the insertion site are >>>>>>>>>>> split but >>>>>>>>>>> position types are not turned to Fuzzy because no part of the >>>>>>>>>>> original >>>>>>>>>>> feature is lost. >>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>> insertion. >>>>>>>>>>> >>>>>>>>>>> ligate >>>>>>>>>>> ====== >>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one >>>>>>>>>>> or two >>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if >>>>>>>>>>> required. >>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>>>>>>> insert in >>>>>>>>>>> turn. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> One situation I haven't handled yet is a deletion that spans >>>>>>>>>>> the >>>>>>>>>>> origin >>>>>>>>>>> of a circular molecule but that should be a rare thing to do >>>>>>>>>>> anyway. The >>>>>>>>>>> code currently throws an error if this is attempted. >>>>>>>>>>> >>>>>>>>>>> I'm happy to contribute the code on Github if there is >>>>>>>>>>> interest? >>>>>>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>>>>>> >>>>>>>>>>> Frank >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>> > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Fri Jan 13 16:59:00 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 13 Jan 2012 21:59:00 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F10A686.9010905@sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <4F10A686.9010905@sanger.ac.uk> Message-ID: <903C573A-B11B-449D-A062-3E2C0771FAD2@illinois.edu> Yes. I don't see a pull request for it yet, though, let me know when it is ready. chris On Jan 13, 2012, at 3:47 PM, Frank Schwach wrote: > Hi Chris, > > Thanks for merging my pull request. I have a version now tha allows using clone as an option. Would that be of interest too? > > Frank > > On 11/01/12 21:03, Frank Schwach wrote: >> Great, I'll work on a branch that gives the user the option to use clone instead of new and then we can see if we want to use that in the end. In the meantime, what do you think about pulling this into bioperl-live? When I have some time again I can work on the HOWTO for these new features for the BioPerl wiki >> >> Frank >> >> >> On 11/01/12 18:42, Fields, Christopher J wrote: >>> Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). >>> >>> chris >>> >>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: >>> >>>> Hi Frank, >>>> >>>> Looks great, I like the use of between locations, didn't think of that. >>>> >>>> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? >>>> >>>> Cheers, >>>> Roy. >>>> >>>> On 11/01/2012 18:16, Frank Schwach wrote: >>>>> Hi Roy and Chris, >>>>> >>>>> I have made the changes to the code now. As you suggested, feature ends >>>>> no longer change type and I insert a note instead to inform about the >>>>> deletion (or insertion), showing the length and position. >>>>> I have also added a feature to annotate deletion sites themselves (with >>>>> IN-BETWEEN locations). >>>>> >>>>> Roy's test script now prints: >>>>> >>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>> ACCESSION unknown >>>>> FEATURES Location/Qualifiers >>>>> CDS join(2..3,4..6) >>>>> /note="3bp internal deletion between pos 3 and 4" >>>>> CDS 2..3 >>>>> /note="2bp deleted from feature end" >>>>> misc_feature 3^4 >>>>> /note="deletion of 3bp" >>>>> ORIGIN >>>>> 1 aaaaaaa >>>>> // >>>>> >>>>> >>>>> or, if you add strand information (-1 in this case) to the second feature: >>>>> >>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>> ACCESSION unknown >>>>> FEATURES Location/Qualifiers >>>>> CDS join(2..3,4..6) >>>>> /note="3bp internal deletion between pos 3 and 4" >>>>> CDS complement(2..3) >>>>> /note="2bp deleted from feature 5' end" >>>>> misc_feature 3^4 >>>>> /note="deletion of 3bp" >>>>> ORIGIN >>>>> 1 aaaaaaa >>>>> // >>>>> >>>>> I have comitted this along with some bugfixes to my master branch on GitHub >>>>> https://github.com/fschwach/bioperl-live >>>>> so it's now also in my existing pull request. >>>>> >>>>> I'm still wondering if cloning the sequence objects rather than calling >>>>> 'new' on their respective classes would be an option inside 'delete' and >>>>> 'insert'? >>>>> I'm experimenting with this for my own purposes because I have to work >>>>> with custom sub-classes of Bio::Seq which have additional attributes and >>>>> therefore set 'can_call_new' to false. >>>>> Without cloning the objects, I first have to convert the custom >>>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. >>>>> Is there any reason why something like Clone::Fast should not be used in >>>>> this case? It seems to work for me but there may be situations where >>>>> this is going to blow up which I am not aware of. >>>>> Cloning rather than calling new could be made an option in >>>>> Bio::SeqUtils. I have most of the code for that already. >>>>> >>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 10/01/12 17:31, Roy Chaudhuri wrote: >>>>>> Or without the typo: >>>>>> >>>>>> CDS join(2..3,4..6) >>>>>> /note="3 bp internal deletion" >>>>>> CDS 2..3 >>>>>> /note="2 bp deleted from 3' end" >>>>>> >>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: >>>>>>> I think it's me that didn't explain very well - I was talking about >>>>>>> overlapping (rather than spanning) a deletion, although I think the same >>>>>>> principle applies to the spanning example you gave. Here's some test >>>>>>> code: >>>>>>> >>>>>>> #!/usr/bin/perl >>>>>>> use warnings FATAL=>qw(all); >>>>>>> use strict; >>>>>>> use Bio::Seq; >>>>>>> use Bio::SeqIO; >>>>>>> use Bio::SeqUtils; >>>>>>> use Bio::SeqFeature::Generic; >>>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>> -start=>2, >>>>>>> -end=>9)); >>>>>>> >>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>> -start=>2, >>>>>>> -end=>5)); >>>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >>>>>>> print $out $trunc; >>>>>>> >>>>>>> >>>>>>> This currently outputs: >>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>> ACCESSION unknown >>>>>>> FEATURES Location/Qualifiers >>>>>>> CDS join(2..>3,<4..6) >>>>>>> CDS 2..>3 >>>>>>> ORIGIN >>>>>>> 1 aaaaaaa >>>>>>> // >>>>>>> >>>>>>> However, I was suggesting that the feature table should be something >>>>>>> like: >>>>>>> CDS join(2..3,4..6) >>>>>>> /note="3 bp internal deletion" >>>>>>> CDS join(2..3) >>>>>>> /note="2 bp deleted from 3' end" >>>>>>> >>>>>>> Fuzzy locations are intended to represent features which have boundaries >>>>>>> spanning outside of the sequence. For a defined deletion that's not the >>>>>>> case, the boundaries of the feature aren't unknown, they have been >>>>>>> specifically altered. >>>>>>> >>>>>>> Hope this is clearer. >>>>>>> Cheers, >>>>>>> Roy. >>>>>>> >>>>>>> On 10/01/2012 16:47, Frank Schwach wrote: >>>>>>>> Hi Roy, >>>>>>>> >>>>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries >>>>>>>> of the feature that become fuzzy but the "inner" ones of the split >>>>>>>> locations: >>>>>>>> >>>>>>>> -------------------- a feature's location >>>>>>>> ==========xxxx================= sequence >>>>>>>> >>>>>>>> >>>>>>>> --------- sublocation 1 >>>>>>>> -------- sublocation 2 >>>>>>>> =============================== >>>>>>>> >>>>>>>> x= sequence to delete >>>>>>>> The feature's location has changed from Simple to Split. >>>>>>>> >>>>>>>> Sublocation 1: >>>>>>>> start is still EXACT and has not changed >>>>>>>> end is now AFTER because this is not a true end of the feature >>>>>>>> >>>>>>>> Sublocation 2: >>>>>>>> start is BEFORE >>>>>>>> end is EXACT (but shifted) >>>>>>>> >>>>>>>> I hope this makes more sense(?) >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>>>>>>> Hi Frank, >>>>>>>>> >>>>>>>>> Looks good to me. One thing I'm not sure about - why do features >>>>>>>>> overlapping a deletion become fuzzy? That behaviour is in >>>>>>>>> trunc_with_features because it's intended to represent a taking a >>>>>>>>> subregion of a larger sequence, but if you're representing an internal >>>>>>>>> deletion then the boundaries of the overlapping feature aren't >>>>>>>>> unknown, >>>>>>>>> they have been specifically altered. Maybe you could give absolute >>>>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been >>>>>>>>> truncated by however many bases. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Roy. >>>>>>>>> >>>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>>>>>>> Hi Chris, >>>>>>>>>> >>>>>>>>>> I have made the changes in a Git fork and made the pull request now. >>>>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>>>>>>>> HOWTO for the BioPerl wiki. >>>>>>>>>> >>>>>>>>>> Frank >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>>>>>>> Sounds very promising! The easiest way to contribute is via a >>>>>>>>>>> fork of the code on Github with a pull request (as you already >>>>>>>>>>> know, being a contributor to the Primer3 modules). >>>>>>>>>>> >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>>>>>>>> sequence >>>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a >>>>>>>>>>>> vector >>>>>>>>>>>> and insert a fragment into it while preserving all the >>>>>>>>>>>> annotations and >>>>>>>>>>>> moving the features accordingly. >>>>>>>>>>>> My main aim was to split features that span deletion/insertion >>>>>>>>>>>> sites in >>>>>>>>>>>> a meaningful way, which can not be done with the currently availble >>>>>>>>>>>> methods. >>>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>>>>>>>> methods: >>>>>>>>>>>> >>>>>>>>>>>> delete >>>>>>>>>>>> ====== >>>>>>>>>>>> removes a segment from a sequence object and adjusts positions >>>>>>>>>>>> and types >>>>>>>>>>>> of locations of sequence features: >>>>>>>>>>>> - locations of features that span the deletion sites are turned >>>>>>>>>>>> into >>>>>>>>>>>> Splits. >>>>>>>>>>>> - locations that extend into the deleted region are turned to >>>>>>>>>>>> Fuzzy to >>>>>>>>>>>> indicate that their true start/end was lost. >>>>>>>>>>>> - locations contained inside the deleted regions are lost. >>>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>>> deletion. >>>>>>>>>>>> >>>>>>>>>>>> insert >>>>>>>>>>>> ====== >>>>>>>>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>>>>>>>> sites. This also affects the features on the recipient sequence: >>>>>>>>>>>> - locations of features that span the insertion site are split but >>>>>>>>>>>> position types are not turned to Fuzzy because no part of the >>>>>>>>>>>> original >>>>>>>>>>>> feature is lost. >>>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>>> insertion. >>>>>>>>>>>> >>>>>>>>>>>> ligate >>>>>>>>>>>> ====== >>>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>>>>>>>> insert in >>>>>>>>>>>> turn. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> One situation I haven't handled yet is a deletion that spans the >>>>>>>>>>>> origin >>>>>>>>>>>> of a circular molecule but that should be a rare thing to do >>>>>>>>>>>> anyway. The >>>>>>>>>>>> code currently throws an error if this is attempted. >>>>>>>>>>>> >>>>>>>>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>>>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>>>>>>> >>>>>>>>>>>> Frank >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>> >> >> > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From hlapp at drycafe.net Sat Jan 14 11:12:05 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Sat, 14 Jan 2012 11:12:05 -0500 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <4F0FF8FF.6070706@inria.fr> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> Message-ID: Hi Florian, You could do that (and it might have advantages in terms of code separation), but you don't have to. In general, adaptor classes get instantiated by the Bioperl-DB framework when a Bioperl class that is mapped to it needs to get serialized or populated. Since there is no class in Bioperl that would correspond to a seqfeature relationship, those situations won't occur. So you could just keep it simple and expand store_children() and correspondingly their retrieval in the adaptor class for seqfeatures. But as hinted above, you may still prefer a separate adaptor class just to keep the nitty-gritty of storing/loading the relationships out of the main adaptor class. Really up to you how you feel more comfortable. -hilmar Sent with a tap. On Jan 13, 2012, at 4:27 AM, lajus wrote: > I should write > - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB/BioSQL > of course > > Le 13/01/2012 10:25, lajus a ?crit : >> Hi hilmar, >> >> Thanks for your hint, but I'm quite lost in the BioPerl architecture (and quite new in perl programming). I'd like to use the handling of term-to-term relationships as a template but I don't find what files are related to this. >> >> As far as I understand, I should create: >> - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB >> - a new object SeqFeatureRelationship (and its interface) in Bio/Seqfeature >> - modify SeqFeatureAdaptor to store children (just with a call to subSeqFeature in store_children sub and thanks to my SeqFeatureRelationshipAdaptor create new relationships) >> - modify SeqFeatureAdaptor to retrieve children ( thanks to my SeqFeatureRelationshipAdaptor create new relationships ) >> >> Is it the right way? >> >> Florian >> >> Le 12/01/2012 18:49, Hilmar Lapp a ?crit : >>> Hi Florian, >>> >>> Thanks for digging this up - this is what I had in memory, but I ran out of time last night in ascertaining that it is indeed still true. >>> >>> It'd be awesome if you can add the code to SeqFeatureAdaptor to also persist and retrieve sub-features. I think the object-relational mappings are all there already (in BaseDriver.pm). You could use the handling of bioentry-to-bioentry relationships (or term-to-term relationships) as a template for how to implement this. >>> >>> -hilmar >>> >>> On Jan 12, 2012, at 4:23 AM, lajus wrote: >>> >>>> Ok, I have looked in BioPerl code and it appears that subSeqFeature are not handled yet: >>>> comment in SeqFeatureAdaptor.pm for store children function (and attach childrenn too): >>>> "Bio::SeqFeatureI has a location, annotation, and possibly sub-seqfeatures as children. The latter is not implemented yet." >>>> >>>> So it's totally normal, if it doesn't work. >>>> Have you started to implement this stuff, or should I rewrite another SeqFeatureAdaptor which handle this ? >>>> >>>> Florian >>>> >>>> Le 11/01/2012 16:44, Fields, Christopher J a ?crit : >>>>> Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? >>>>> >>>>> chris >>>>> >>>>> On Jan 11, 2012, at 7:09 AM, lajus wrote: >>>>> >>>>>> Therefore, if I look in verbose mode, I can see that in the stack I have many : >>>>>> >>>>>> no adaptor found for class Bio::Annotation::TypeManager >>>>>> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory >>>>>> >>>>>> Just warning, no errors but... >>>>>> Any clues? >>>>>> >>>>>> Thanks by advance, >>>>>> >>>>>> Florian >>>>>> >>>>>> Le 11/01/2012 13:43, lajus a ?crit : >>>>>>> I have looked to the Unflattener and the magic works quite fine. >>>>>>> Then, the $seq which is given (by side-effect) by >>>>>>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>>>>>> has a good hierarchy for us. >>>>>>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >>>>>>> But when I create a persitent object for $seq and if I create it: >>>>>>> $adaptor->create_persistent($seq); >>>>>>> $pseq->create(); >>>>>>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >>>>>>> >>>>>>> Do you have an explanation? >>>>>>> >>>>>>> Florian >>>>>>> >>>>>>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>>>>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>>>>>> >>>>>>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>>>>>>> Hello, >>>>>>>>>> I am currently working on a refactoring of the Genolevures project >>>>>>>>>> (http://www.genolevures.org/) >>>>>>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>>>>>>> database. >>>>>>>>>> >>>>>>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>>>>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>>>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>>>>>> I find it strange in so far as there is a relationship between gene and its >>>>>>>>>> CDS. right? >>>>>>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>>>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>>>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>>>>>>> attempts to infer this kind of relationship, and if it did, if that would >>>>>>>>> get record in the BioSQL tables. >>>>>>>>> >>>>>>>>> Peter >>>>>>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>>>>>>> >>>>>>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>>>>>> >>>>>>>> chris >>>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From hlapp at drycafe.net Sat Jan 14 11:25:17 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Sat, 14 Jan 2012 11:25:17 -0500 Subject: [Bioperl-l] Bio-EUtilities released In-Reply-To: <71E7C798-F2FF-44F5-BB1B-C0539FF361EB@illinois.edu> References: <71E7C798-F2FF-44F5-BB1B-C0539FF361EB@illinois.edu> Message-ID: <6C334671-3C69-4288-8FB4-C58AB1F9D289@drycafe.net> Indeed this is great, thanks for continuing to push this, Chris! -hilmar Sent with a tap. On Jan 13, 2012, at 11:18 AM, "Fields, Christopher J" wrote: > All, > > I have released the (now separate) Bio-EUtilities on CPAN. For this release I switched to using an independent version number (1.71) that is simpler and evaluates as newer that the current bioperl version (1.006902). > > http://search.cpan.org/dist/Bio-EUtilities/ > > This release doesn't seem to interfere with the current BioPerl CPAN releases: one should be able to install this, which will then install all prerequisites (including BioPerl and older EUtilities code). The code can be retrieved here: > > chris > > PS : this bodes well for splitting off code from bioperl-live into their own repositories. We have a few distributions that are already on this path (Bio-FeatureIO, Bio-Root, etc.). > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Mon Jan 16 04:32:03 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 16 Jan 2012 09:32:03 +0000 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: References: <4F10A59E.5040807@sanger.ac.uk> Message-ID: <4F13EE93.2010502@sanger.ac.uk> Hi Pawan , Please always "reply to all", so that you keep the discussion on the bioperl mailing list and more people can help you. What you need is a very basic Perl command. I could give you the code but I think you get more out of it if you experiment with it on your own because it is very fundamental. I'll point you in the right direction: you want an if-then-else conditional construct. Perl's documentation about this is here: http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs if strand is 1 you want to print "PLUS" else if it is -1 you want to print "MINUS", or else you might want to print "no strand" or something, or even treat it as an error and make the script abort. Give it a go and let us know if you need help. For basic (non-bio) Perl question, please also check out the community at http://www.perlmonks.org/. Hope that helps, Frank On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote: > Hi frank, > > Thanks for your kind reply. > I could get the vale for query as 1 value if it is plus. > and for hit = -1 if it is minus. > But i would like to print out as PLUS or MINUS not 1 or -1 my friend. > > you can see my code as below: > > while ( my $result = $searchio->next_result() ) { > my $QueryName = $result->query_name(), my $QueryDescript = > $result->query_description(); > my $QueryLength = $result->query_length; > my $NoHits = $result->num_hits; > > while( my $hit = $result->next_hit ) { > my $HitName = $hit->name(); > my $HitDescrip = $hit->description(); > my $HitLength = $hit->length; > my $Score = $hit->raw_score(); > my $Bits = $hit->bits; > > my $hsp = $hit->next_hsp; # Only check first (= best) hsp > my $Evalue = $hsp->evalue(); > my $AlnLen = $hsp->num_identical(); > my $TotalLen = $hsp->hsp_length; > my $QueryStrand = $hsp->strand('query'); > my $HitStrand = $hsp->strand('hit'); > > if($Evalue< $cutoff){ > print "$QueryName $QueryDescript\t"; > print "$QueryLength\t"; > print "$NoHits\t"; > print "$HitName $HitDescrip\t"; > print "$HitLength\t"; > print "$Score\t"; > print "$Bits\t"; > print "$Evalue\t"; > print "$AlnLen\t"; > print "$TotalLen\t"; > print "$QueryStrand\t"; > print "$HitStrand\n"; > } > } > print "\n"; > } > > > This is a part of my code. > > i have blastn report as below: > > BLASTN 2.2.18 [Mar-02-2008] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= ORB_1210001_hsa-miR-548aa#5_1 > (24 letters) > > Database: hsa-mmu-rno_miRNA.fa > 3524 sequences; 76,424 total letters > > Searching..................................................done > > > > Score E > Sequences producing significant alignments: (bits) Value > > hsa-miR-548aa > 48 2e-009 > hsa-miR-548d-5p > 36 9e-006 > hsa-miR-548b-5p > 36 9e-006 > hsa-miR-548z > 34 3e-005 > hsa-miR-548q > 30 5e-004 > hsa-miR-548n > 30 5e-004 > hsa-miR-548ab > 28 0.002 > hsa-miR-548v > 28 0.002 > hsa-miR-548c-5p > 28 0.002 > hsa-miR-548ag > 26 0.008 > hsa-miR-548u > 26 0.008 > hsa-miR-548c-3p > 26 0.008 > hsa-miR-603 > 26 0.008 > hsa-miR-548a-3p > 26 0.008 > hsa-miR-548ac > 24 0.033 > hsa-miR-548an > 22 0.13 > hsa-miR-548aj > 22 0.13 > hsa-miR-548i > 22 0.13 > hsa-miR-548g > 22 0.13 > hsa-miR-548j > 22 0.13 > hsa-miR-548a-5p > 22 0.13 > >> hsa-miR-548aa > Length = 25 > > Score = 48.1 bits (24), Expect = 2e-009 > Identities = 24/24 (100%) > Strand = Plus / Minus > > > Query: 1 tggtgcaaaagtaattgtggtttt 24 > |||||||||||||||||||||||| > Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 > > >> hsa-miR-548d-5p > Length = 22 > > Score = 36.2 bits (18), Expect = 9e-006 > Identities = 18/18 (100%) > Strand = Plus / Plus > > > Query: 7 aaaagtaattgtggtttt 24 > |||||||||||||||||| > Sbjct: 1 aaaagtaattgtggtttt 18 > > > > in this result i could not parse my code. i think my code does not > accept the Query header that is > "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast output. > > kindly help me out. > > with regards, > Pawan. > > > On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach wrote: >> Hi Pawan, >> >> Can you show your code? Is it basically following the structure shown in >> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >> ? >> >> If that is the case >> >> $hsp->strand('query') >> >> >> is exactly what you need. >> To check if hit and query are on different strands you can do: >> >> if ( $hsp->strand('query') >> * $hsp->strand('hit') == -1){ >> >> # do whatever you need to do if they are on opposite strands >> >> } >> >> Hope that helps >> >> Frank >> >> >> >> >> >> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: >>> Hi, >>> Using Bio::SearchIO module I am parsing the following Blast >>> result. >>> I have used the option- $hsp->strand('query'). >>> >>> But I cannot get detail of alignment. >>> >>> I need to know if my hit is forward (Strand = Plus / Plus) >>> or reverse ( Strand = Plus / Minus)... >>> Can anyone help me to get report as Plus or Minus for query or hit. >>> >>> thanks in advanced. >>> >>> With regards, >>> Pawan >>> >>> >>> >>> BLASTN 2.2.18 [Dec-23-2011] >>> >>> >>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, >>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >>> programs", Nucleic Acids Res. 25:3389-3402. >>> >>> Query= 000013_c10079-9984 >>> (50 letters) >>> >>> Database: Cyano_Probe.fasta >>> 4760 sequences; 238,000 total letters >>> >>> Searching..................................................done >>> >>> >>> >>> Score >>> E >>> Sequences producing significant alignments: (bits) >>> Value >>> >>> 000013_c10079-9984 >>> 100 7e-024 >>> 002619_2689273-2690037 >>> 24 0.36 >>> 001126_c1123720-1123385 >>> 24 0.36 >>> 003211_c3326737-3326480 >>> 22 1.4 >>> 002415_2471082-2471420 >>> 22 1.4 >>> 002269_2321276-2322463 >>> 22 1.4 >>> 001328_c1326535-1326164 >>> 22 1.4 >>> >>>> 000013_c10079-9984 >>> Length = 50 >>> >>> Score = 99.6 bits (50), Expect = 7e-024 >>> Identities = 50/50 (100%) >>> Strand = Plus / Plus >>> >>> >>> Query: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>> Sbjct: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, >> a charity registered in England with number 1021457 and a company registered >> in England with number 2742969, whose registered office is 215 Euston Road, >> London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From charlotte.payne at gmail.com Mon Jan 16 06:36:51 2012 From: charlotte.payne at gmail.com (charlotte payne) Date: Mon, 16 Jan 2012 11:36:51 +0000 Subject: [Bioperl-l] retrieving 19 vertebrate PECAN alignment using Compara (modified PracticeCompara.pl) Message-ID: Hello, I'm quite new to BioPerl and new to this list, and would really appreciate some help as I've been stuck on this for some time now! Thanks so much, here's the problem.. I'm trying to download a sequence alignment in fasta format, for the 19 amniotic vertebrates The Compara website suggests that the term "amniotes" can be used for the species_set_name in this case, but I get the following error message: amniotes is not a valid species name for this instance -------------------- EXCEPTION -------------------- MSG: Registry configuration file has no data for connecting to \$help, "species=s" => \$species, "coord_system=s" => \$coord_system, "seq_region=s" => \$seq_region, "seq_region_start=i" => \$seq_region_start, "seq_region_end=i" => \$seq_region_end, "method_link_type=s" => \$method_link_type, "species_set_name=s" => \$species_set_name, "output_format=s" => \$output_format, "output_file=s" => \$output_file); # Print Help and exit if ($help) { print $usage; exit(0); } if ($output_file) { open(STDOUT, ">$output_file") or die("Cannot open $output_file"); } Bio::EnsEMBL::Registry->load_registry_from_db( -host => 'ensembldb.ensembl.org', -user => 'anonymous'); # Getting all the Bio::EnsEMBL::Compara::GenomeDB objects my $genome_dbs; my $genome_db_adaptor = Bio::EnsEMBL::Registry->get_adaptor( 'Multi', 'compara', 'GenomeDB'); throw("Cannot connect to Compara") if (!$genome_db_adaptor); foreach my $this_species (split(":", $species_set_name)) { my $this_meta_container_adaptor = Bio::EnsEMBL::Registry->get_adaptor( $this_species, 'core', 'MetaContainer'); throw("Registry configuration file has no data for connecting to <$this_species&") if (!$this_meta_container_adaptor); my $this_production_name = $this_meta_container_adaptor->get_production_name; # Fetch Bio::EnsEMBL::Compara::GenomeDB object my $genome_db = $genome_db_adaptor->fetch_by_name_assembly($this_production_name); # Add Bio::EnsEMBL::Compara::GenomeDB object to the list push(@$genome_dbs, $genome_db); } # Getting Bio::EnsEMBL::Compara::MethodLinkSpeciesSet object my $method_link_species_set_adaptor = Bio::EnsEMBL::Registry->get_adaptor( 'Multi', 'compara', 'MethodLinkSpeciesSet'); my $method_link_species_set = $method_link_species_set_adaptor->fetch_by_method_link_type_GenomeDBs( $method_link_type, $genome_dbs); throw("The database do not contain any $method_link_type data for $species_set_name!") if (!$method_link_species_set); # Fetching the query Slice: my $slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor($species, 'core', 'Slice'); throw("Registry configuration file has no data for connecting to <$species>") if (!$slice_adaptor); my $query_slice = $slice_adaptor->fetch_by_region('toplevel', $seq_region, $seq_region_start, $seq_region_end); throw("No Slice can be created with coordinates $seq_region:$seq_region_start-". "$seq_region_end") if (!$query_slice); # Fetching all the GenomicAlignBlock corresponding to this Slice: my $genomic_align_block_adaptor = Bio::EnsEMBL::Registry->get_adaptor( 'Multi', 'compara', 'GenomicAlignBlock'); my $genomic_align_blocks = $genomic_align_block_adaptor->fetch_all_by_MethodLinkSpeciesSet_Slice( $method_link_species_set, $query_slice); my $all_aligns; # Get a Bio::SimpleAlign object from every GenomicAlignBlock foreach my $this_genomic_align_block (@$genomic_align_blocks) { my $simple_align = $this_genomic_align_block->get_SimpleAlign; push(@$all_aligns, $simple_align); } # print all the genomic alignments using a Bio::AlignIO object my $alignIO = Bio::AlignIO->newFh( -interleaved => 0, -fh => \*STDOUT, -format => $output_format, -idlength => 10 ); foreach my $this_align (@$all_aligns) { print $alignIO $this_align; } exit; my $outfile = "lama2.fasta"; open (OUTFILE, $outfile) or die "can't read file : $!"; my $firstline=; print $firstline; close (IN); From pawan.mani2 at gmail.com Mon Jan 16 10:14:21 2012 From: pawan.mani2 at gmail.com (kakchingtabam pawankumar sharma) Date: Mon, 16 Jan 2012 20:44:21 +0530 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: <4F13EE93.2010502@sanger.ac.uk> References: <4F10A59E.5040807@sanger.ac.uk> <4F13EE93.2010502@sanger.ac.uk> Message-ID: So By using the if else conditon function, I have solve Frank. I mean is there anyway in bioperl we can get directly using other module! I hope u got it! So my second Question have not replied that is i have blastn report as below: BLASTN 2.2.18 [Mar-02-2008] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= ORB_1210001_hsa-miR-548aa#5_1 (24 letters) Database: hsa-mmu-rno_miRNA.fa 3524 sequences; 76,424 total letters Searching..................................................done Score E Sequences producing significant alignments: (bits) Value hsa-miR-548aa 48 2e-009 hsa-miR-548d-5p 36 9e-006 hsa-miR-548b-5p 36 9e-006 hsa-miR-548z 34 3e-005 hsa-miR-548q 30 5e-004 hsa-miR-548n 30 5e-004 hsa-miR-548ab 28 0.002 hsa-miR-548v 28 0.002 hsa-miR-548c-5p 28 0.002 hsa-miR-548ag 26 0.008 hsa-miR-548u 26 0.008 hsa-miR-548c-3p 26 0.008 hsa-miR-603 26 0.008 hsa-miR-548a-3p 26 0.008 hsa-miR-548ac 24 0.033 hsa-miR-548an 22 0.13 hsa-miR-548aj 22 0.13 hsa-miR-548i 22 0.13 hsa-miR-548g 22 0.13 hsa-miR-548j 22 0.13 hsa-miR-548a-5p 22 0.13 >hsa-miR-548aa Length = 25 Score = 48.1 bits (24), Expect = 2e-009 Identities = 24/24 (100%) Strand = Plus / Minus Query: 1 tggtgcaaaagtaattgtggtttt 24 |||||||||||||||||||||||| Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >hsa-miR-548d-5p Length = 22 Score = 36.2 bits (18), Expect = 9e-006 Identities = 18/18 (100%) Strand = Plus / Plus Query: 7 aaaagtaattgtggtttt 24 |||||||||||||||||| Sbjct: 1 aaaagtaattgtggtttt 18 in this result i could not parse my code. i think my code does not accept the Query header that is "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast output. kindly help me out. with regards, Pawan. On 1/16/12, Frank Schwach wrote: > Hi Pawan , > > Please always "reply to all", so that you keep the discussion on the > bioperl mailing list and more people can help you. > What you need is a very basic Perl command. I could give you the code > but I think you get more out of it if you experiment with it on your own > because it is very fundamental. I'll point you in the right direction: > you want an if-then-else conditional construct. > > Perl's documentation about this is here: > http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs > > if strand is 1 you want to print "PLUS" else if it is -1 you want to > print "MINUS", or else you might want to print "no strand" or something, > or even treat it as an error and make the script abort. > > Give it a go and let us know if you need help. For basic (non-bio) Perl > question, please also check out the community at http://www.perlmonks.org/. > > Hope that helps, > > Frank > > > On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote: >> Hi frank, >> >> Thanks for your kind reply. >> I could get the vale for query as 1 value if it is plus. >> and for hit = -1 if it is minus. >> But i would like to print out as PLUS or MINUS not 1 or -1 my friend. >> >> you can see my code as below: >> >> while ( my $result = $searchio->next_result() ) { >> my $QueryName = $result->query_name(), my $QueryDescript = >> $result->query_description(); >> my $QueryLength = $result->query_length; >> my $NoHits = $result->num_hits; >> >> while( my $hit = $result->next_hit ) { >> my $HitName = $hit->name(); >> my $HitDescrip = $hit->description(); >> my $HitLength = $hit->length; >> my $Score = $hit->raw_score(); >> my $Bits = $hit->bits; >> >> my $hsp = $hit->next_hsp; # Only check first (= best) hsp >> my $Evalue = $hsp->evalue(); >> my $AlnLen = $hsp->num_identical(); >> my $TotalLen = $hsp->hsp_length; >> my $QueryStrand = $hsp->strand('query'); >> my $HitStrand = $hsp->strand('hit'); >> >> if($Evalue< $cutoff){ >> print "$QueryName $QueryDescript\t"; >> print "$QueryLength\t"; >> print "$NoHits\t"; >> print "$HitName $HitDescrip\t"; >> print "$HitLength\t"; >> print "$Score\t"; >> print "$Bits\t"; >> print "$Evalue\t"; >> print "$AlnLen\t"; >> print "$TotalLen\t"; >> print "$QueryStrand\t"; >> print "$HitStrand\n"; >> } >> } >> print "\n"; >> } >> >> >> This is a part of my code. >> >> i have blastn report as below: >> >> BLASTN 2.2.18 [Mar-02-2008] >> >> >> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, >> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >> programs", Nucleic Acids Res. 25:3389-3402. >> >> Query= ORB_1210001_hsa-miR-548aa#5_1 >> (24 letters) >> >> Database: hsa-mmu-rno_miRNA.fa >> 3524 sequences; 76,424 total letters >> >> Searching..................................................done >> >> >> >> Score >> E >> Sequences producing significant alignments: (bits) >> Value >> >> hsa-miR-548aa >> 48 2e-009 >> hsa-miR-548d-5p >> 36 9e-006 >> hsa-miR-548b-5p >> 36 9e-006 >> hsa-miR-548z >> 34 3e-005 >> hsa-miR-548q >> 30 5e-004 >> hsa-miR-548n >> 30 5e-004 >> hsa-miR-548ab >> 28 0.002 >> hsa-miR-548v >> 28 0.002 >> hsa-miR-548c-5p >> 28 0.002 >> hsa-miR-548ag >> 26 0.008 >> hsa-miR-548u >> 26 0.008 >> hsa-miR-548c-3p >> 26 0.008 >> hsa-miR-603 >> 26 0.008 >> hsa-miR-548a-3p >> 26 0.008 >> hsa-miR-548ac >> 24 0.033 >> hsa-miR-548an >> 22 0.13 >> hsa-miR-548aj >> 22 0.13 >> hsa-miR-548i >> 22 0.13 >> hsa-miR-548g >> 22 0.13 >> hsa-miR-548j >> 22 0.13 >> hsa-miR-548a-5p >> 22 0.13 >> >>> hsa-miR-548aa >> Length = 25 >> >> Score = 48.1 bits (24), Expect = 2e-009 >> Identities = 24/24 (100%) >> Strand = Plus / Minus >> >> >> Query: 1 tggtgcaaaagtaattgtggtttt 24 >> |||||||||||||||||||||||| >> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >> >> >>> hsa-miR-548d-5p >> Length = 22 >> >> Score = 36.2 bits (18), Expect = 9e-006 >> Identities = 18/18 (100%) >> Strand = Plus / Plus >> >> >> Query: 7 aaaagtaattgtggtttt 24 >> |||||||||||||||||| >> Sbjct: 1 aaaagtaattgtggtttt 18 >> >> >> >> in this result i could not parse my code. i think my code does not >> accept the Query header that is >> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >> output. >> >> kindly help me out. >> >> with regards, >> Pawan. >> >> >> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach wrote: >>> Hi Pawan, >>> >>> Can you show your code? Is it basically following the structure shown in >>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>> ? >>> >>> If that is the case >>> >>> $hsp->strand('query') >>> >>> >>> is exactly what you need. >>> To check if hit and query are on different strands you can do: >>> >>> if ( $hsp->strand('query') >>> * $hsp->strand('hit') == -1){ >>> >>> # do whatever you need to do if they are on opposite strands >>> >>> } >>> >>> Hope that helps >>> >>> Frank >>> >>> >>> >>> >>> >>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: >>>> Hi, >>>> Using Bio::SearchIO module I am parsing the following >>>> Blast >>>> result. >>>> I have used the option- $hsp->strand('query'). >>>> >>>> But I cannot get detail of alignment. >>>> >>>> I need to know if my hit is forward (Strand = Plus / Plus) >>>> or reverse ( Strand = Plus / Minus)... >>>> Can anyone help me to get report as Plus or Minus for query or hit. >>>> >>>> thanks in advanced. >>>> >>>> With regards, >>>> Pawan >>>> >>>> >>>> >>>> BLASTN 2.2.18 [Dec-23-2011] >>>> >>>> >>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>> Schaffer, >>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >>>> programs", Nucleic Acids Res. 25:3389-3402. >>>> >>>> Query= 000013_c10079-9984 >>>> (50 letters) >>>> >>>> Database: Cyano_Probe.fasta >>>> 4760 sequences; 238,000 total letters >>>> >>>> Searching..................................................done >>>> >>>> >>>> >>>> Score >>>> E >>>> Sequences producing significant alignments: (bits) >>>> Value >>>> >>>> 000013_c10079-9984 >>>> 100 7e-024 >>>> 002619_2689273-2690037 >>>> 24 0.36 >>>> 001126_c1123720-1123385 >>>> 24 0.36 >>>> 003211_c3326737-3326480 >>>> 22 1.4 >>>> 002415_2471082-2471420 >>>> 22 1.4 >>>> 002269_2321276-2322463 >>>> 22 1.4 >>>> 001328_c1326535-1326164 >>>> 22 1.4 >>>> >>>>> 000013_c10079-9984 >>>> Length = 50 >>>> >>>> Score = 99.6 bits (50), Expect = 7e-024 >>>> Identities = 50/50 (100%) >>>> Strand = Plus / Plus >>>> >>>> >>>> Query: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>> Sbjct: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, >>> a charity registered in England with number 1021457 and a company >>> registered >>> in England with number 2742969, whose registered office is 215 Euston >>> Road, >>> London, NW1 2BE. > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > From fs5 at sanger.ac.uk Mon Jan 16 10:35:04 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Mon, 16 Jan 2012 15:35:04 +0000 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: References: <4F10A59E.5040807@sanger.ac.uk> <4F13EE93.2010502@sanger.ac.uk> Message-ID: <4F1443A8.6000007@sanger.ac.uk> Excellent, well done! No, this is the way to do it. In BioPerl modules that use strand information you will find the values +1/-1 or undef. If you want to display those as PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy whatever, you have to convert it, but you know now how to do it. You have a syntax error in your code where you retrieve the query name: my $QueryName = $result->query_name(), my $QueryDescript = $result->query_description(); should be two lines and the comma should be a semicolon. Good luck! Frank On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote: > So By using the if else conditon function, I have solve Frank. > I mean is there anyway in bioperl we can get directly using other > module! I hope u got it! > > > So my second Question have not replied that is > > i have blastn report as below: > > BLASTN 2.2.18 [Mar-02-2008] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= ORB_1210001_hsa-miR-548aa#5_1 > (24 letters) > > Database: hsa-mmu-rno_miRNA.fa > 3524 sequences; 76,424 total letters > > Searching..................................................done > > > > Score E > Sequences producing significant alignments: (bits) Value > > hsa-miR-548aa > 48 2e-009 > hsa-miR-548d-5p > 36 9e-006 > hsa-miR-548b-5p > 36 9e-006 > hsa-miR-548z > 34 3e-005 > hsa-miR-548q > 30 5e-004 > hsa-miR-548n > 30 5e-004 > hsa-miR-548ab > 28 0.002 > hsa-miR-548v > 28 0.002 > hsa-miR-548c-5p > 28 0.002 > hsa-miR-548ag > 26 0.008 > hsa-miR-548u > 26 0.008 > hsa-miR-548c-3p > 26 0.008 > hsa-miR-603 > 26 0.008 > hsa-miR-548a-3p > 26 0.008 > hsa-miR-548ac > 24 0.033 > hsa-miR-548an > 22 0.13 > hsa-miR-548aj > 22 0.13 > hsa-miR-548i > 22 0.13 > hsa-miR-548g > 22 0.13 > hsa-miR-548j > 22 0.13 > hsa-miR-548a-5p > 22 0.13 > >> hsa-miR-548aa > Length = 25 > > Score = 48.1 bits (24), Expect = 2e-009 > Identities = 24/24 (100%) > Strand = Plus / Minus > > > Query: 1 tggtgcaaaagtaattgtggtttt 24 > |||||||||||||||||||||||| > Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 > > >> hsa-miR-548d-5p > Length = 22 > > Score = 36.2 bits (18), Expect = 9e-006 > Identities = 18/18 (100%) > Strand = Plus / Plus > > > Query: 7 aaaagtaattgtggtttt 24 > |||||||||||||||||| > Sbjct: 1 aaaagtaattgtggtttt 18 > > > > in this result i could not parse my code. i think my code does not > accept the Query header that is > "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast output. > > kindly help me out. > > with regards, > Pawan. > > On 1/16/12, Frank Schwach wrote: >> Hi Pawan , >> >> Please always "reply to all", so that you keep the discussion on the >> bioperl mailing list and more people can help you. >> What you need is a very basic Perl command. I could give you the code >> but I think you get more out of it if you experiment with it on your own >> because it is very fundamental. I'll point you in the right direction: >> you want an if-then-else conditional construct. >> >> Perl's documentation about this is here: >> http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs >> >> if strand is 1 you want to print "PLUS" else if it is -1 you want to >> print "MINUS", or else you might want to print "no strand" or something, >> or even treat it as an error and make the script abort. >> >> Give it a go and let us know if you need help. For basic (non-bio) Perl >> question, please also check out the community at http://www.perlmonks.org/. >> >> Hope that helps, >> >> Frank >> >> >> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote: >>> Hi frank, >>> >>> Thanks for your kind reply. >>> I could get the vale for query as 1 value if it is plus. >>> and for hit = -1 if it is minus. >>> But i would like to print out as PLUS or MINUS not 1 or -1 my friend. >>> >>> you can see my code as below: >>> >>> while ( my $result = $searchio->next_result() ) { >>> my $QueryName = $result->query_name(), my $QueryDescript = >>> $result->query_description(); >>> my $QueryLength = $result->query_length; >>> my $NoHits = $result->num_hits; >>> >>> while( my $hit = $result->next_hit ) { >>> my $HitName = $hit->name(); >>> my $HitDescrip = $hit->description(); >>> my $HitLength = $hit->length; >>> my $Score = $hit->raw_score(); >>> my $Bits = $hit->bits; >>> >>> my $hsp = $hit->next_hsp; # Only check first (= best) hsp >>> my $Evalue = $hsp->evalue(); >>> my $AlnLen = $hsp->num_identical(); >>> my $TotalLen = $hsp->hsp_length; >>> my $QueryStrand = $hsp->strand('query'); >>> my $HitStrand = $hsp->strand('hit'); >>> >>> if($Evalue< $cutoff){ >>> print "$QueryName $QueryDescript\t"; >>> print "$QueryLength\t"; >>> print "$NoHits\t"; >>> print "$HitName $HitDescrip\t"; >>> print "$HitLength\t"; >>> print "$Score\t"; >>> print "$Bits\t"; >>> print "$Evalue\t"; >>> print "$AlnLen\t"; >>> print "$TotalLen\t"; >>> print "$QueryStrand\t"; >>> print "$HitStrand\n"; >>> } >>> } >>> print "\n"; >>> } >>> >>> >>> This is a part of my code. >>> >>> i have blastn report as below: >>> >>> BLASTN 2.2.18 [Mar-02-2008] >>> >>> >>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, >>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >>> programs", Nucleic Acids Res. 25:3389-3402. >>> >>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>> (24 letters) >>> >>> Database: hsa-mmu-rno_miRNA.fa >>> 3524 sequences; 76,424 total letters >>> >>> Searching..................................................done >>> >>> >>> >>> Score >>> E >>> Sequences producing significant alignments: (bits) >>> Value >>> >>> hsa-miR-548aa >>> 48 2e-009 >>> hsa-miR-548d-5p >>> 36 9e-006 >>> hsa-miR-548b-5p >>> 36 9e-006 >>> hsa-miR-548z >>> 34 3e-005 >>> hsa-miR-548q >>> 30 5e-004 >>> hsa-miR-548n >>> 30 5e-004 >>> hsa-miR-548ab >>> 28 0.002 >>> hsa-miR-548v >>> 28 0.002 >>> hsa-miR-548c-5p >>> 28 0.002 >>> hsa-miR-548ag >>> 26 0.008 >>> hsa-miR-548u >>> 26 0.008 >>> hsa-miR-548c-3p >>> 26 0.008 >>> hsa-miR-603 >>> 26 0.008 >>> hsa-miR-548a-3p >>> 26 0.008 >>> hsa-miR-548ac >>> 24 0.033 >>> hsa-miR-548an >>> 22 0.13 >>> hsa-miR-548aj >>> 22 0.13 >>> hsa-miR-548i >>> 22 0.13 >>> hsa-miR-548g >>> 22 0.13 >>> hsa-miR-548j >>> 22 0.13 >>> hsa-miR-548a-5p >>> 22 0.13 >>> >>>> hsa-miR-548aa >>> Length = 25 >>> >>> Score = 48.1 bits (24), Expect = 2e-009 >>> Identities = 24/24 (100%) >>> Strand = Plus / Minus >>> >>> >>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>> |||||||||||||||||||||||| >>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>> >>> >>>> hsa-miR-548d-5p >>> Length = 22 >>> >>> Score = 36.2 bits (18), Expect = 9e-006 >>> Identities = 18/18 (100%) >>> Strand = Plus / Plus >>> >>> >>> Query: 7 aaaagtaattgtggtttt 24 >>> |||||||||||||||||| >>> Sbjct: 1 aaaagtaattgtggtttt 18 >>> >>> >>> >>> in this result i could not parse my code. i think my code does not >>> accept the Query header that is >>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>> output. >>> >>> kindly help me out. >>> >>> with regards, >>> Pawan. >>> >>> >>> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach wrote: >>>> Hi Pawan, >>>> >>>> Can you show your code? Is it basically following the structure shown in >>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>>> ? >>>> >>>> If that is the case >>>> >>>> $hsp->strand('query') >>>> >>>> >>>> is exactly what you need. >>>> To check if hit and query are on different strands you can do: >>>> >>>> if ( $hsp->strand('query') >>>> * $hsp->strand('hit') == -1){ >>>> >>>> # do whatever you need to do if they are on opposite strands >>>> >>>> } >>>> >>>> Hope that helps >>>> >>>> Frank >>>> >>>> >>>> >>>> >>>> >>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: >>>>> Hi, >>>>> Using Bio::SearchIO module I am parsing the following >>>>> Blast >>>>> result. >>>>> I have used the option- $hsp->strand('query'). >>>>> >>>>> But I cannot get detail of alignment. >>>>> >>>>> I need to know if my hit is forward (Strand = Plus / Plus) >>>>> or reverse ( Strand = Plus / Minus)... >>>>> Can anyone help me to get report as Plus or Minus for query or hit. >>>>> >>>>> thanks in advanced. >>>>> >>>>> With regards, >>>>> Pawan >>>>> >>>>> >>>>> >>>>> BLASTN 2.2.18 [Dec-23-2011] >>>>> >>>>> >>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>> Schaffer, >>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>> Query= 000013_c10079-9984 >>>>> (50 letters) >>>>> >>>>> Database: Cyano_Probe.fasta >>>>> 4760 sequences; 238,000 total letters >>>>> >>>>> Searching..................................................done >>>>> >>>>> >>>>> >>>>> Score >>>>> E >>>>> Sequences producing significant alignments: (bits) >>>>> Value >>>>> >>>>> 000013_c10079-9984 >>>>> 100 7e-024 >>>>> 002619_2689273-2690037 >>>>> 24 0.36 >>>>> 001126_c1123720-1123385 >>>>> 24 0.36 >>>>> 003211_c3326737-3326480 >>>>> 22 1.4 >>>>> 002415_2471082-2471420 >>>>> 22 1.4 >>>>> 002269_2321276-2322463 >>>>> 22 1.4 >>>>> 001328_c1326535-1326164 >>>>> 22 1.4 >>>>> >>>>>> 000013_c10079-9984 >>>>> Length = 50 >>>>> >>>>> Score = 99.6 bits (50), Expect = 7e-024 >>>>> Identities = 50/50 (100%) >>>>> Strand = Plus / Plus >>>>> >>>>> >>>>> Query: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>> Sbjct: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>> Limited, >>>> a charity registered in England with number 1021457 and a company >>>> registered >>>> in England with number 2742969, whose registered office is 215 Euston >>>> Road, >>>> London, NW1 2BE. >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jordi.durban at gmail.com Wed Jan 18 05:57:03 2012 From: jordi.durban at gmail.com (Jordi Durban) Date: Wed, 18 Jan 2012 11:57:03 +0100 Subject: [Bioperl-l] parse blast-xml output Message-ID: Hi all! I'm trying to parse a xml blast output (-m 7 option) in order to get the best hit (I mean the first one) from each result I got. I've done: * my @files=<*>; my @files2 = grep (/^454AllContigs.fna.masked*/, @files); foreach my $blast_report(@files2){ # Get the report my $searchio = new Bio::SearchIO (-format => 'blastxml', -file=>$blast_report, -best =>'true'); while( my $result = $searchio->next_result ) { my $query=$result->query_name(); #~ print @query,"\n"; ##### results quey names while (my $hits = $result->next_hit) { #~ print $hits,"\n"; ###### the whole of hits my $name= $hits->name(); my $desc = $hits->description(); print $query."\t".$name."\t".$desc,"\n"; *But it does not work as I get the whole of results from a single query. What I mean: contig01181 gi|63794|emb|X03832.1| Chicken mRNA 3' end for fast skeletal troponin I (sTnI) contig01181 gi|110293358|gb|DQ646396.1| Lama pacos troponin 1 type 2 (Tnni2) mRNA, partial cds contig01181 gi|298897248|emb|FQ224489.1| Rattus norvegicus TL0ACA64YG07 mRNA sequence contig01181 gi|298892466|emb|FQ217985.1| Rattus norvegicus TL0ACA12YG21 mRNA sequence contig01181 gi|298889559|emb|FQ217454.1| Rattus norvegicus TL0ACA25YO07 mRNA sequence contig01181 gi|298888987|emb|FQ223772.1| Rattus norvegicus TL0ACA87YD21 mRNA sequence I know some perl and I think it is a really newbie question but any help would be appreciate. Thanks a lot. -- Jordi From fs5 at sanger.ac.uk Wed Jan 18 11:21:17 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 18 Jan 2012 16:21:17 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> Hi Roy, I have a use-case for Bio::SeqUtils truncating a feature with negative start location. In my case, this is happening because I transform genomic coordinates of a feature to the coordinate frame of another feature, so feature A that starts 10nt before the reference feature now has a start of -10. I want to trim that to a fuzzy start = "<1" Bio::SeqUtils::_coord_adjust can be used to trim feature A accordingly but we need to change the regex that manipulates the coordinates slightly by adding an optional "-": map s/(\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add +$1>$length) {">$length"} else {$add+$1}/ge, @coords; becomes map s/(-?\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add +$1>$length) {">$length"} else {$add+$1}/ge, @coords; It doesn't change anything else as far as I can tell and adds some more flexibility to the method. If you are ok with it I can push this to my queued pull request for Bio::SeqUtils. Frank On Thu, 2012-01-12 at 14:13 +0000, Frank Schwach wrote: > I have now created a version that gives the option to create the > products of 'delete' and 'insert' via Bio::Root::Root:clone instead of > calling 'new' on the input seq object class. Seems to be working fine > for me so far. > > 'delete' and 'insert' can now take a hashref of options. > The only option so far is to set 'clone_obj to true, to use cloning > instead of creating objects via 'new'. > Setting this parameter to false or not supplying the options hashref at > all will give you the old behaviour (call 'new'). > Example: > > my $product = Bio::SeqUtils->delete( > $seq_obj, > 11, > 20, > { clone_obj => 1} > ); > > The ligate method takes clone_obj as a named parameter: > > my $new_molecule = Bio::Sequtils::Pbrtools->ligate( > -recipient => $vector, > -fragment => $fragment, > -left => 1000, > -right => 1100, > -flip => 1, > -clone_obj => 1 > ); > > This is in a branch of my GitHub repo if you would like to have a look: > > https://github.com/fschwach/bioperl-live/tree/sequtils_clone > > Unfortunately, I can't add this option to trunc_with_features because > the creation of the new object is delegated to 'trunc'. I guess I could > implement 'trunc' in Bio::SeqUtils itself(?) > > What do you think, could this be merged into bioperl-live? > > > Frank > > > > On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach wrote: > > Great, I'll work on a branch that gives the user the option to use clone > > instead of new and then we can see if we want to use that in the end. In > > the meantime, what do you think about pulling this into bioperl-live? > > When I have some time again I can work on the HOWTO for these new > > features for the BioPerl wiki > > > > Frank > > > > > > On 11/01/12 18:42, Fields, Christopher J wrote: > > > Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). > > > > > > chris > > > > > > On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: > > > > > >> Hi Frank, > > >> > > >> Looks great, I like the use of between locations, didn't think of that. > > >> > > >> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? > > >> > > >> Cheers, > > >> Roy. > > >> > > >> On 11/01/2012 18:16, Frank Schwach wrote: > > >>> Hi Roy and Chris, > > >>> > > >>> I have made the changes to the code now. As you suggested, feature ends > > >>> no longer change type and I insert a note instead to inform about the > > >>> deletion (or insertion), showing the length and position. > > >>> I have also added a feature to annotate deletion sites themselves (with > > >>> IN-BETWEEN locations). > > >>> > > >>> Roy's test script now prints: > > >>> > > >>> LOCUS seq-accession_number 7 bp dna linear UNK > > >>> ACCESSION unknown > > >>> FEATURES Location/Qualifiers > > >>> CDS join(2..3,4..6) > > >>> /note="3bp internal deletion between pos 3 and 4" > > >>> CDS 2..3 > > >>> /note="2bp deleted from feature end" > > >>> misc_feature 3^4 > > >>> /note="deletion of 3bp" > > >>> ORIGIN > > >>> 1 aaaaaaa > > >>> // > > >>> > > >>> > > >>> or, if you add strand information (-1 in this case) to the second feature: > > >>> > > >>> LOCUS seq-accession_number 7 bp dna linear UNK > > >>> ACCESSION unknown > > >>> FEATURES Location/Qualifiers > > >>> CDS join(2..3,4..6) > > >>> /note="3bp internal deletion between pos 3 and 4" > > >>> CDS complement(2..3) > > >>> /note="2bp deleted from feature 5' end" > > >>> misc_feature 3^4 > > >>> /note="deletion of 3bp" > > >>> ORIGIN > > >>> 1 aaaaaaa > > >>> // > > >>> > > >>> I have comitted this along with some bugfixes to my master branch on GitHub > > >>> https://github.com/fschwach/bioperl-live > > >>> so it's now also in my existing pull request. > > >>> > > >>> I'm still wondering if cloning the sequence objects rather than calling > > >>> 'new' on their respective classes would be an option inside 'delete' and > > >>> 'insert'? > > >>> I'm experimenting with this for my own purposes because I have to work > > >>> with custom sub-classes of Bio::Seq which have additional attributes and > > >>> therefore set 'can_call_new' to false. > > >>> Without cloning the objects, I first have to convert the custom > > >>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. > > >>> Is there any reason why something like Clone::Fast should not be used in > > >>> this case? It seems to work for me but there may be situations where > > >>> this is going to blow up which I am not aware of. > > >>> Cloning rather than calling new could be made an option in > > >>> Bio::SeqUtils. I have most of the code for that already. > > >>> > > >>> Frank > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> On 10/01/12 17:31, Roy Chaudhuri wrote: > > >>>> Or without the typo: > > >>>> > > >>>> CDS join(2..3,4..6) > > >>>> /note="3 bp internal deletion" > > >>>> CDS 2..3 > > >>>> /note="2 bp deleted from 3' end" > > >>>> > > >>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: > > >>>>> I think it's me that didn't explain very well - I was talking about > > >>>>> overlapping (rather than spanning) a deletion, although I think the same > > >>>>> principle applies to the spanning example you gave. Here's some test > > >>>>> code: > > >>>>> > > >>>>> #!/usr/bin/perl > > >>>>> use warnings FATAL=>qw(all); > > >>>>> use strict; > > >>>>> use Bio::Seq; > > >>>>> use Bio::SeqIO; > > >>>>> use Bio::SeqUtils; > > >>>>> use Bio::SeqFeature::Generic; > > >>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); > > >>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > > >>>>> -start=>2, > > >>>>> -end=>9)); > > >>>>> > > >>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > > >>>>> -start=>2, > > >>>>> -end=>5)); > > >>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); > > >>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); > > >>>>> print $out $trunc; > > >>>>> > > >>>>> > > >>>>> This currently outputs: > > >>>>> LOCUS seq-accession_number 7 bp dna linear UNK > > >>>>> ACCESSION unknown > > >>>>> FEATURES Location/Qualifiers > > >>>>> CDS join(2..>3,<4..6) > > >>>>> CDS 2..>3 > > >>>>> ORIGIN > > >>>>> 1 aaaaaaa > > >>>>> // > > >>>>> > > >>>>> However, I was suggesting that the feature table should be something > > >>>>> like: > > >>>>> CDS join(2..3,4..6) > > >>>>> /note="3 bp internal deletion" > > >>>>> CDS join(2..3) > > >>>>> /note="2 bp deleted from 3' end" > > >>>>> > > >>>>> Fuzzy locations are intended to represent features which have boundaries > > >>>>> spanning outside of the sequence. For a defined deletion that's not the > > >>>>> case, the boundaries of the feature aren't unknown, they have been > > >>>>> specifically altered. > > >>>>> > > >>>>> Hope this is clearer. > > >>>>> Cheers, > > >>>>> Roy. > > >>>>> > > >>>>> On 10/01/2012 16:47, Frank Schwach wrote: > > >>>>>> Hi Roy, > > >>>>>> > > >>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries > > >>>>>> of the feature that become fuzzy but the "inner" ones of the split > > >>>>>> locations: > > >>>>>> > > >>>>>> -------------------- a feature's location > > >>>>>> ==========xxxx================= sequence > > >>>>>> > > >>>>>> > > >>>>>> --------- sublocation 1 > > >>>>>> -------- sublocation 2 > > >>>>>> =============================== > > >>>>>> > > >>>>>> x= sequence to delete > > >>>>>> The feature's location has changed from Simple to Split. > > >>>>>> > > >>>>>> Sublocation 1: > > >>>>>> start is still EXACT and has not changed > > >>>>>> end is now AFTER because this is not a true end of the feature > > >>>>>> > > >>>>>> Sublocation 2: > > >>>>>> start is BEFORE > > >>>>>> end is EXACT (but shifted) > > >>>>>> > > >>>>>> I hope this makes more sense(?) > > >>>>>> > > >>>>>> Cheers, > > >>>>>> > > >>>>>> Frank > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: > > >>>>>>> Hi Frank, > > >>>>>>> > > >>>>>>> Looks good to me. One thing I'm not sure about - why do features > > >>>>>>> overlapping a deletion become fuzzy? That behaviour is in > > >>>>>>> trunc_with_features because it's intended to represent a taking a > > >>>>>>> subregion of a larger sequence, but if you're representing an internal > > >>>>>>> deletion then the boundaries of the overlapping feature aren't > > >>>>>>> unknown, > > >>>>>>> they have been specifically altered. Maybe you could give absolute > > >>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been > > >>>>>>> truncated by however many bases. > > >>>>>>> > > >>>>>>> Cheers, > > >>>>>>> Roy. > > >>>>>>> > > >>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: > > >>>>>>>> Hi Chris, > > >>>>>>>> > > >>>>>>>> I have made the changes in a Git fork and made the pull request now. > > >>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils > > >>>>>>>> HOWTO for the BioPerl wiki. > > >>>>>>>> > > >>>>>>>> Frank > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: > > >>>>>>>>> Sounds very promising! The easiest way to contribute is via a > > >>>>>>>>> fork of the code on Github with a pull request (as you already > > >>>>>>>>> know, being a contributor to the Primer3 modules). > > >>>>>>>>> > > >>>>>>>>> chris > > >>>>>>>>> > > >>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: > > >>>>>>>>> > > >>>>>>>>>> Hi all, > > >>>>>>>>>> > > >>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and > > >>>>>>>>>> sequence > > >>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a > > >>>>>>>>>> vector > > >>>>>>>>>> and insert a fragment into it while preserving all the > > >>>>>>>>>> annotations and > > >>>>>>>>>> moving the features accordingly. > > >>>>>>>>>> My main aim was to split features that span deletion/insertion > > >>>>>>>>>> sites in > > >>>>>>>>>> a meaningful way, which can not be done with the currently availble > > >>>>>>>>>> methods. > > >>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new > > >>>>>>>>>> methods: > > >>>>>>>>>> > > >>>>>>>>>> delete > > >>>>>>>>>> ====== > > >>>>>>>>>> removes a segment from a sequence object and adjusts positions > > >>>>>>>>>> and types > > >>>>>>>>>> of locations of sequence features: > > >>>>>>>>>> - locations of features that span the deletion sites are turned > > >>>>>>>>>> into > > >>>>>>>>>> Splits. > > >>>>>>>>>> - locations that extend into the deleted region are turned to > > >>>>>>>>>> Fuzzy to > > >>>>>>>>>> indicate that their true start/end was lost. > > >>>>>>>>>> - locations contained inside the deleted regions are lost. > > >>>>>>>>>> - other features are shifted according to the length of the > > >>>>>>>>>> deletion. > > >>>>>>>>>> > > >>>>>>>>>> insert > > >>>>>>>>>> ====== > > >>>>>>>>>> adds a Bio::Seq object into another one between specified insertion > > >>>>>>>>>> sites. This also affects the features on the recipient sequence: > > >>>>>>>>>> - locations of features that span the insertion site are split but > > >>>>>>>>>> position types are not turned to Fuzzy because no part of the > > >>>>>>>>>> original > > >>>>>>>>>> feature is lost. > > >>>>>>>>>> - other features are shifted according to the length of the > > >>>>>>>>>> insertion. > > >>>>>>>>>> > > >>>>>>>>>> ligate > > >>>>>>>>>> ====== > > >>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two > > >>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. > > >>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and > > >>>>>>>>>> insert in > > >>>>>>>>>> turn. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> One situation I haven't handled yet is a deletion that spans the > > >>>>>>>>>> origin > > >>>>>>>>>> of a circular molecule but that should be a rare thing to do > > >>>>>>>>>> anyway. The > > >>>>>>>>>> code currently throws an error if this is attempted. > > >>>>>>>>>> > > >>>>>>>>>> I'm happy to contribute the code on Github if there is interest? > > >>>>>>>>>> Comments on the handling of feature locations highly welcome! > > >>>>>>>>>> > > >>>>>>>>>> Frank > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>> > > >>>>>> > > >>> > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From roy.chaudhuri at gmail.com Wed Jan 18 11:49:48 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Wed, 18 Jan 2012 16:49:48 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <4F16F82C.2090503@gmail.com> Ok, if the tests pass and it's useful then it's fine by me (not that you need my permission, of course). Just out of interest, do negative feature coordinates work with the rest of BioPerl? I don't think they are covered in the DDBJ/EMBL/GenBank feature table definition. Roy. On 18/01/2012 16:21, Frank Schwach wrote: > Hi Roy, > > I have a use-case for Bio::SeqUtils truncating a feature with negative > start location. In my case, this is happening because I transform > genomic coordinates of a feature to the coordinate frame of another > feature, so feature A that starts 10nt before the reference feature now > has a start of -10. I want to trim that to a fuzzy start = "<1" > > Bio::SeqUtils::_coord_adjust can be used to trim feature A accordingly > but we need to change the regex that manipulates the coordinates > slightly by adding an optional "-": > > map s/(\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add > +$1>$length) {">$length"} else {$add+$1}/ge, @coords; > > becomes > > map s/(-?\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add > +$1>$length) {">$length"} else {$add+$1}/ge, @coords; > > It doesn't change anything else as far as I can tell and adds some more > flexibility to the method. If you are ok with it I can push this to my > queued pull request for Bio::SeqUtils. > > Frank > > > > > > On Thu, 2012-01-12 at 14:13 +0000, Frank Schwach wrote: >> I have now created a version that gives the option to create the >> products of 'delete' and 'insert' via Bio::Root::Root:clone instead of >> calling 'new' on the input seq object class. Seems to be working fine >> for me so far. >> >> 'delete' and 'insert' can now take a hashref of options. >> The only option so far is to set 'clone_obj to true, to use cloning >> instead of creating objects via 'new'. >> Setting this parameter to false or not supplying the options hashref at >> all will give you the old behaviour (call 'new'). >> Example: >> >> my $product = Bio::SeqUtils->delete( >> $seq_obj, >> 11, >> 20, >> { clone_obj => 1} >> ); >> >> The ligate method takes clone_obj as a named parameter: >> >> my $new_molecule = Bio::Sequtils::Pbrtools->ligate( >> -recipient => $vector, >> -fragment => $fragment, >> -left => 1000, >> -right => 1100, >> -flip => 1, >> -clone_obj => 1 >> ); >> >> This is in a branch of my GitHub repo if you would like to have a look: >> >> https://github.com/fschwach/bioperl-live/tree/sequtils_clone >> >> Unfortunately, I can't add this option to trunc_with_features because >> the creation of the new object is delegated to 'trunc'. I guess I could >> implement 'trunc' in Bio::SeqUtils itself(?) >> >> What do you think, could this be merged into bioperl-live? >> >> >> Frank >> >> >> >> On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach wrote: >>> Great, I'll work on a branch that gives the user the option to use clone >>> instead of new and then we can see if we want to use that in the end. In >>> the meantime, what do you think about pulling this into bioperl-live? >>> When I have some time again I can work on the HOWTO for these new >>> features for the BioPerl wiki >>> >>> Frank >>> >>> >>> On 11/01/12 18:42, Fields, Christopher J wrote: >>>> Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). >>>> >>>> chris >>>> >>>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: >>>> >>>>> Hi Frank, >>>>> >>>>> Looks great, I like the use of between locations, didn't think of that. >>>>> >>>>> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? >>>>> >>>>> Cheers, >>>>> Roy. >>>>> >>>>> On 11/01/2012 18:16, Frank Schwach wrote: >>>>>> Hi Roy and Chris, >>>>>> >>>>>> I have made the changes to the code now. As you suggested, feature ends >>>>>> no longer change type and I insert a note instead to inform about the >>>>>> deletion (or insertion), showing the length and position. >>>>>> I have also added a feature to annotate deletion sites themselves (with >>>>>> IN-BETWEEN locations). >>>>>> >>>>>> Roy's test script now prints: >>>>>> >>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>> ACCESSION unknown >>>>>> FEATURES Location/Qualifiers >>>>>> CDS join(2..3,4..6) >>>>>> /note="3bp internal deletion between pos 3 and 4" >>>>>> CDS 2..3 >>>>>> /note="2bp deleted from feature end" >>>>>> misc_feature 3^4 >>>>>> /note="deletion of 3bp" >>>>>> ORIGIN >>>>>> 1 aaaaaaa >>>>>> // >>>>>> >>>>>> >>>>>> or, if you add strand information (-1 in this case) to the second feature: >>>>>> >>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>> ACCESSION unknown >>>>>> FEATURES Location/Qualifiers >>>>>> CDS join(2..3,4..6) >>>>>> /note="3bp internal deletion between pos 3 and 4" >>>>>> CDS complement(2..3) >>>>>> /note="2bp deleted from feature 5' end" >>>>>> misc_feature 3^4 >>>>>> /note="deletion of 3bp" >>>>>> ORIGIN >>>>>> 1 aaaaaaa >>>>>> // >>>>>> >>>>>> I have comitted this along with some bugfixes to my master branch on GitHub >>>>>> https://github.com/fschwach/bioperl-live >>>>>> so it's now also in my existing pull request. >>>>>> >>>>>> I'm still wondering if cloning the sequence objects rather than calling >>>>>> 'new' on their respective classes would be an option inside 'delete' and >>>>>> 'insert'? >>>>>> I'm experimenting with this for my own purposes because I have to work >>>>>> with custom sub-classes of Bio::Seq which have additional attributes and >>>>>> therefore set 'can_call_new' to false. >>>>>> Without cloning the objects, I first have to convert the custom >>>>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. >>>>>> Is there any reason why something like Clone::Fast should not be used in >>>>>> this case? It seems to work for me but there may be situations where >>>>>> this is going to blow up which I am not aware of. >>>>>> Cloning rather than calling new could be made an option in >>>>>> Bio::SeqUtils. I have most of the code for that already. >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 10/01/12 17:31, Roy Chaudhuri wrote: >>>>>>> Or without the typo: >>>>>>> >>>>>>> CDS join(2..3,4..6) >>>>>>> /note="3 bp internal deletion" >>>>>>> CDS 2..3 >>>>>>> /note="2 bp deleted from 3' end" >>>>>>> >>>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: >>>>>>>> I think it's me that didn't explain very well - I was talking about >>>>>>>> overlapping (rather than spanning) a deletion, although I think the same >>>>>>>> principle applies to the spanning example you gave. Here's some test >>>>>>>> code: >>>>>>>> >>>>>>>> #!/usr/bin/perl >>>>>>>> use warnings FATAL=>qw(all); >>>>>>>> use strict; >>>>>>>> use Bio::Seq; >>>>>>>> use Bio::SeqIO; >>>>>>>> use Bio::SeqUtils; >>>>>>>> use Bio::SeqFeature::Generic; >>>>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>>> -start=>2, >>>>>>>> -end=>9)); >>>>>>>> >>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>>> -start=>2, >>>>>>>> -end=>5)); >>>>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >>>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >>>>>>>> print $out $trunc; >>>>>>>> >>>>>>>> >>>>>>>> This currently outputs: >>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>>> ACCESSION unknown >>>>>>>> FEATURES Location/Qualifiers >>>>>>>> CDS join(2..>3,<4..6) >>>>>>>> CDS 2..>3 >>>>>>>> ORIGIN >>>>>>>> 1 aaaaaaa >>>>>>>> // >>>>>>>> >>>>>>>> However, I was suggesting that the feature table should be something >>>>>>>> like: >>>>>>>> CDS join(2..3,4..6) >>>>>>>> /note="3 bp internal deletion" >>>>>>>> CDS join(2..3) >>>>>>>> /note="2 bp deleted from 3' end" >>>>>>>> >>>>>>>> Fuzzy locations are intended to represent features which have boundaries >>>>>>>> spanning outside of the sequence. For a defined deletion that's not the >>>>>>>> case, the boundaries of the feature aren't unknown, they have been >>>>>>>> specifically altered. >>>>>>>> >>>>>>>> Hope this is clearer. >>>>>>>> Cheers, >>>>>>>> Roy. >>>>>>>> >>>>>>>> On 10/01/2012 16:47, Frank Schwach wrote: >>>>>>>>> Hi Roy, >>>>>>>>> >>>>>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries >>>>>>>>> of the feature that become fuzzy but the "inner" ones of the split >>>>>>>>> locations: >>>>>>>>> >>>>>>>>> -------------------- a feature's location >>>>>>>>> ==========xxxx================= sequence >>>>>>>>> >>>>>>>>> >>>>>>>>> --------- sublocation 1 >>>>>>>>> -------- sublocation 2 >>>>>>>>> =============================== >>>>>>>>> >>>>>>>>> x= sequence to delete >>>>>>>>> The feature's location has changed from Simple to Split. >>>>>>>>> >>>>>>>>> Sublocation 1: >>>>>>>>> start is still EXACT and has not changed >>>>>>>>> end is now AFTER because this is not a true end of the feature >>>>>>>>> >>>>>>>>> Sublocation 2: >>>>>>>>> start is BEFORE >>>>>>>>> end is EXACT (but shifted) >>>>>>>>> >>>>>>>>> I hope this makes more sense(?) >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> >>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>>>>>>>> Hi Frank, >>>>>>>>>> >>>>>>>>>> Looks good to me. One thing I'm not sure about - why do features >>>>>>>>>> overlapping a deletion become fuzzy? That behaviour is in >>>>>>>>>> trunc_with_features because it's intended to represent a taking a >>>>>>>>>> subregion of a larger sequence, but if you're representing an internal >>>>>>>>>> deletion then the boundaries of the overlapping feature aren't >>>>>>>>>> unknown, >>>>>>>>>> they have been specifically altered. Maybe you could give absolute >>>>>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been >>>>>>>>>> truncated by however many bases. >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Roy. >>>>>>>>>> >>>>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>>>>>>>> Hi Chris, >>>>>>>>>>> >>>>>>>>>>> I have made the changes in a Git fork and made the pull request now. >>>>>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>>>>>>>>> HOWTO for the BioPerl wiki. >>>>>>>>>>> >>>>>>>>>>> Frank >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>>>>>>>> Sounds very promising! The easiest way to contribute is via a >>>>>>>>>>>> fork of the code on Github with a pull request (as you already >>>>>>>>>>>> know, being a contributor to the Primer3 modules). >>>>>>>>>>>> >>>>>>>>>>>> chris >>>>>>>>>>>> >>>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>>>>>>>>> sequence >>>>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a >>>>>>>>>>>>> vector >>>>>>>>>>>>> and insert a fragment into it while preserving all the >>>>>>>>>>>>> annotations and >>>>>>>>>>>>> moving the features accordingly. >>>>>>>>>>>>> My main aim was to split features that span deletion/insertion >>>>>>>>>>>>> sites in >>>>>>>>>>>>> a meaningful way, which can not be done with the currently availble >>>>>>>>>>>>> methods. >>>>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>>>>>>>>> methods: >>>>>>>>>>>>> >>>>>>>>>>>>> delete >>>>>>>>>>>>> ====== >>>>>>>>>>>>> removes a segment from a sequence object and adjusts positions >>>>>>>>>>>>> and types >>>>>>>>>>>>> of locations of sequence features: >>>>>>>>>>>>> - locations of features that span the deletion sites are turned >>>>>>>>>>>>> into >>>>>>>>>>>>> Splits. >>>>>>>>>>>>> - locations that extend into the deleted region are turned to >>>>>>>>>>>>> Fuzzy to >>>>>>>>>>>>> indicate that their true start/end was lost. >>>>>>>>>>>>> - locations contained inside the deleted regions are lost. >>>>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>>>> deletion. >>>>>>>>>>>>> >>>>>>>>>>>>> insert >>>>>>>>>>>>> ====== >>>>>>>>>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>>>>>>>>> sites. This also affects the features on the recipient sequence: >>>>>>>>>>>>> - locations of features that span the insertion site are split but >>>>>>>>>>>>> position types are not turned to Fuzzy because no part of the >>>>>>>>>>>>> original >>>>>>>>>>>>> feature is lost. >>>>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>>>> insertion. >>>>>>>>>>>>> >>>>>>>>>>>>> ligate >>>>>>>>>>>>> ====== >>>>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>>>>>>>>> insert in >>>>>>>>>>>>> turn. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> One situation I haven't handled yet is a deletion that spans the >>>>>>>>>>>>> origin >>>>>>>>>>>>> of a circular molecule but that should be a rare thing to do >>>>>>>>>>>>> anyway. The >>>>>>>>>>>>> code currently throws an error if this is attempted. >>>>>>>>>>>>> >>>>>>>>>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>>>>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>>>>>>>> >>>>>>>>>>>>> Frank >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> >>> >>> >> >> >> > > > From fs5 at sanger.ac.uk Wed Jan 18 12:08:16 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 18 Jan 2012 17:08:16 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F16F82C.2090503@gmail.com> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> Message-ID: <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> Thanks Roy! I have pushed that to master now and it's part of my pending pull request. I just wanted to ask you because this is code you had written and there might be a good reason not to do this. Yes, you can create a SeqFeature object with negative coordinates. Cheers, Frank On Wed, 2012-01-18 at 16:49 +0000, Roy Chaudhuri wrote: > Ok, if the tests pass and it's useful then it's fine by me (not that you > need my permission, of course). Just out of interest, do negative > feature coordinates work with the rest of BioPerl? I don't think they > are covered in the DDBJ/EMBL/GenBank feature table definition. > > Roy. > > On 18/01/2012 16:21, Frank Schwach wrote: > > Hi Roy, > > > > I have a use-case for Bio::SeqUtils truncating a feature with negative > > start location. In my case, this is happening because I transform > > genomic coordinates of a feature to the coordinate frame of another > > feature, so feature A that starts 10nt before the reference feature now > > has a start of -10. I want to trim that to a fuzzy start = "<1" > > > > Bio::SeqUtils::_coord_adjust can be used to trim feature A accordingly > > but we need to change the regex that manipulates the coordinates > > slightly by adding an optional "-": > > > > map s/(\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add > > +$1>$length) {">$length"} else {$add+$1}/ge, @coords; > > > > becomes > > > > map s/(-?\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add > > +$1>$length) {">$length"} else {$add+$1}/ge, @coords; > > > > It doesn't change anything else as far as I can tell and adds some more > > flexibility to the method. If you are ok with it I can push this to my > > queued pull request for Bio::SeqUtils. > > > > Frank > > > > > > > > > > > > On Thu, 2012-01-12 at 14:13 +0000, Frank Schwach wrote: > >> I have now created a version that gives the option to create the > >> products of 'delete' and 'insert' via Bio::Root::Root:clone instead of > >> calling 'new' on the input seq object class. Seems to be working fine > >> for me so far. > >> > >> 'delete' and 'insert' can now take a hashref of options. > >> The only option so far is to set 'clone_obj to true, to use cloning > >> instead of creating objects via 'new'. > >> Setting this parameter to false or not supplying the options hashref at > >> all will give you the old behaviour (call 'new'). > >> Example: > >> > >> my $product = Bio::SeqUtils->delete( > >> $seq_obj, > >> 11, > >> 20, > >> { clone_obj => 1} > >> ); > >> > >> The ligate method takes clone_obj as a named parameter: > >> > >> my $new_molecule = Bio::Sequtils::Pbrtools->ligate( > >> -recipient => $vector, > >> -fragment => $fragment, > >> -left => 1000, > >> -right => 1100, > >> -flip => 1, > >> -clone_obj => 1 > >> ); > >> > >> This is in a branch of my GitHub repo if you would like to have a look: > >> > >> https://github.com/fschwach/bioperl-live/tree/sequtils_clone > >> > >> Unfortunately, I can't add this option to trunc_with_features because > >> the creation of the new object is delegated to 'trunc'. I guess I could > >> implement 'trunc' in Bio::SeqUtils itself(?) > >> > >> What do you think, could this be merged into bioperl-live? > >> > >> > >> Frank > >> > >> > >> > >> On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach wrote: > >>> Great, I'll work on a branch that gives the user the option to use clone > >>> instead of new and then we can see if we want to use that in the end. In > >>> the meantime, what do you think about pulling this into bioperl-live? > >>> When I have some time again I can work on the HOWTO for these new > >>> features for the BioPerl wiki > >>> > >>> Frank > >>> > >>> > >>> On 11/01/12 18:42, Fields, Christopher J wrote: > >>>> Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). > >>>> > >>>> chris > >>>> > >>>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: > >>>> > >>>>> Hi Frank, > >>>>> > >>>>> Looks great, I like the use of between locations, didn't think of that. > >>>>> > >>>>> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? > >>>>> > >>>>> Cheers, > >>>>> Roy. > >>>>> > >>>>> On 11/01/2012 18:16, Frank Schwach wrote: > >>>>>> Hi Roy and Chris, > >>>>>> > >>>>>> I have made the changes to the code now. As you suggested, feature ends > >>>>>> no longer change type and I insert a note instead to inform about the > >>>>>> deletion (or insertion), showing the length and position. > >>>>>> I have also added a feature to annotate deletion sites themselves (with > >>>>>> IN-BETWEEN locations). > >>>>>> > >>>>>> Roy's test script now prints: > >>>>>> > >>>>>> LOCUS seq-accession_number 7 bp dna linear UNK > >>>>>> ACCESSION unknown > >>>>>> FEATURES Location/Qualifiers > >>>>>> CDS join(2..3,4..6) > >>>>>> /note="3bp internal deletion between pos 3 and 4" > >>>>>> CDS 2..3 > >>>>>> /note="2bp deleted from feature end" > >>>>>> misc_feature 3^4 > >>>>>> /note="deletion of 3bp" > >>>>>> ORIGIN > >>>>>> 1 aaaaaaa > >>>>>> // > >>>>>> > >>>>>> > >>>>>> or, if you add strand information (-1 in this case) to the second feature: > >>>>>> > >>>>>> LOCUS seq-accession_number 7 bp dna linear UNK > >>>>>> ACCESSION unknown > >>>>>> FEATURES Location/Qualifiers > >>>>>> CDS join(2..3,4..6) > >>>>>> /note="3bp internal deletion between pos 3 and 4" > >>>>>> CDS complement(2..3) > >>>>>> /note="2bp deleted from feature 5' end" > >>>>>> misc_feature 3^4 > >>>>>> /note="deletion of 3bp" > >>>>>> ORIGIN > >>>>>> 1 aaaaaaa > >>>>>> // > >>>>>> > >>>>>> I have comitted this along with some bugfixes to my master branch on GitHub > >>>>>> https://github.com/fschwach/bioperl-live > >>>>>> so it's now also in my existing pull request. > >>>>>> > >>>>>> I'm still wondering if cloning the sequence objects rather than calling > >>>>>> 'new' on their respective classes would be an option inside 'delete' and > >>>>>> 'insert'? > >>>>>> I'm experimenting with this for my own purposes because I have to work > >>>>>> with custom sub-classes of Bio::Seq which have additional attributes and > >>>>>> therefore set 'can_call_new' to false. > >>>>>> Without cloning the objects, I first have to convert the custom > >>>>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. > >>>>>> Is there any reason why something like Clone::Fast should not be used in > >>>>>> this case? It seems to work for me but there may be situations where > >>>>>> this is going to blow up which I am not aware of. > >>>>>> Cloning rather than calling new could be made an option in > >>>>>> Bio::SeqUtils. I have most of the code for that already. > >>>>>> > >>>>>> Frank > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 10/01/12 17:31, Roy Chaudhuri wrote: > >>>>>>> Or without the typo: > >>>>>>> > >>>>>>> CDS join(2..3,4..6) > >>>>>>> /note="3 bp internal deletion" > >>>>>>> CDS 2..3 > >>>>>>> /note="2 bp deleted from 3' end" > >>>>>>> > >>>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: > >>>>>>>> I think it's me that didn't explain very well - I was talking about > >>>>>>>> overlapping (rather than spanning) a deletion, although I think the same > >>>>>>>> principle applies to the spanning example you gave. Here's some test > >>>>>>>> code: > >>>>>>>> > >>>>>>>> #!/usr/bin/perl > >>>>>>>> use warnings FATAL=>qw(all); > >>>>>>>> use strict; > >>>>>>>> use Bio::Seq; > >>>>>>>> use Bio::SeqIO; > >>>>>>>> use Bio::SeqUtils; > >>>>>>>> use Bio::SeqFeature::Generic; > >>>>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); > >>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > >>>>>>>> -start=>2, > >>>>>>>> -end=>9)); > >>>>>>>> > >>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > >>>>>>>> -start=>2, > >>>>>>>> -end=>5)); > >>>>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); > >>>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); > >>>>>>>> print $out $trunc; > >>>>>>>> > >>>>>>>> > >>>>>>>> This currently outputs: > >>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK > >>>>>>>> ACCESSION unknown > >>>>>>>> FEATURES Location/Qualifiers > >>>>>>>> CDS join(2..>3,<4..6) > >>>>>>>> CDS 2..>3 > >>>>>>>> ORIGIN > >>>>>>>> 1 aaaaaaa > >>>>>>>> // > >>>>>>>> > >>>>>>>> However, I was suggesting that the feature table should be something > >>>>>>>> like: > >>>>>>>> CDS join(2..3,4..6) > >>>>>>>> /note="3 bp internal deletion" > >>>>>>>> CDS join(2..3) > >>>>>>>> /note="2 bp deleted from 3' end" > >>>>>>>> > >>>>>>>> Fuzzy locations are intended to represent features which have boundaries > >>>>>>>> spanning outside of the sequence. For a defined deletion that's not the > >>>>>>>> case, the boundaries of the feature aren't unknown, they have been > >>>>>>>> specifically altered. > >>>>>>>> > >>>>>>>> Hope this is clearer. > >>>>>>>> Cheers, > >>>>>>>> Roy. > >>>>>>>> > >>>>>>>> On 10/01/2012 16:47, Frank Schwach wrote: > >>>>>>>>> Hi Roy, > >>>>>>>>> > >>>>>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries > >>>>>>>>> of the feature that become fuzzy but the "inner" ones of the split > >>>>>>>>> locations: > >>>>>>>>> > >>>>>>>>> -------------------- a feature's location > >>>>>>>>> ==========xxxx================= sequence > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> --------- sublocation 1 > >>>>>>>>> -------- sublocation 2 > >>>>>>>>> =============================== > >>>>>>>>> > >>>>>>>>> x= sequence to delete > >>>>>>>>> The feature's location has changed from Simple to Split. > >>>>>>>>> > >>>>>>>>> Sublocation 1: > >>>>>>>>> start is still EXACT and has not changed > >>>>>>>>> end is now AFTER because this is not a true end of the feature > >>>>>>>>> > >>>>>>>>> Sublocation 2: > >>>>>>>>> start is BEFORE > >>>>>>>>> end is EXACT (but shifted) > >>>>>>>>> > >>>>>>>>> I hope this makes more sense(?) > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> > >>>>>>>>> Frank > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: > >>>>>>>>>> Hi Frank, > >>>>>>>>>> > >>>>>>>>>> Looks good to me. One thing I'm not sure about - why do features > >>>>>>>>>> overlapping a deletion become fuzzy? That behaviour is in > >>>>>>>>>> trunc_with_features because it's intended to represent a taking a > >>>>>>>>>> subregion of a larger sequence, but if you're representing an internal > >>>>>>>>>> deletion then the boundaries of the overlapping feature aren't > >>>>>>>>>> unknown, > >>>>>>>>>> they have been specifically altered. Maybe you could give absolute > >>>>>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been > >>>>>>>>>> truncated by however many bases. > >>>>>>>>>> > >>>>>>>>>> Cheers, > >>>>>>>>>> Roy. > >>>>>>>>>> > >>>>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: > >>>>>>>>>>> Hi Chris, > >>>>>>>>>>> > >>>>>>>>>>> I have made the changes in a Git fork and made the pull request now. > >>>>>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils > >>>>>>>>>>> HOWTO for the BioPerl wiki. > >>>>>>>>>>> > >>>>>>>>>>> Frank > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: > >>>>>>>>>>>> Sounds very promising! The easiest way to contribute is via a > >>>>>>>>>>>> fork of the code on Github with a pull request (as you already > >>>>>>>>>>>> know, being a contributor to the Primer3 modules). > >>>>>>>>>>>> > >>>>>>>>>>>> chris > >>>>>>>>>>>> > >>>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: > >>>>>>>>>>>> > >>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and > >>>>>>>>>>>>> sequence > >>>>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a > >>>>>>>>>>>>> vector > >>>>>>>>>>>>> and insert a fragment into it while preserving all the > >>>>>>>>>>>>> annotations and > >>>>>>>>>>>>> moving the features accordingly. > >>>>>>>>>>>>> My main aim was to split features that span deletion/insertion > >>>>>>>>>>>>> sites in > >>>>>>>>>>>>> a meaningful way, which can not be done with the currently availble > >>>>>>>>>>>>> methods. > >>>>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new > >>>>>>>>>>>>> methods: > >>>>>>>>>>>>> > >>>>>>>>>>>>> delete > >>>>>>>>>>>>> ====== > >>>>>>>>>>>>> removes a segment from a sequence object and adjusts positions > >>>>>>>>>>>>> and types > >>>>>>>>>>>>> of locations of sequence features: > >>>>>>>>>>>>> - locations of features that span the deletion sites are turned > >>>>>>>>>>>>> into > >>>>>>>>>>>>> Splits. > >>>>>>>>>>>>> - locations that extend into the deleted region are turned to > >>>>>>>>>>>>> Fuzzy to > >>>>>>>>>>>>> indicate that their true start/end was lost. > >>>>>>>>>>>>> - locations contained inside the deleted regions are lost. > >>>>>>>>>>>>> - other features are shifted according to the length of the > >>>>>>>>>>>>> deletion. > >>>>>>>>>>>>> > >>>>>>>>>>>>> insert > >>>>>>>>>>>>> ====== > >>>>>>>>>>>>> adds a Bio::Seq object into another one between specified insertion > >>>>>>>>>>>>> sites. This also affects the features on the recipient sequence: > >>>>>>>>>>>>> - locations of features that span the insertion site are split but > >>>>>>>>>>>>> position types are not turned to Fuzzy because no part of the > >>>>>>>>>>>>> original > >>>>>>>>>>>>> feature is lost. > >>>>>>>>>>>>> - other features are shifted according to the length of the > >>>>>>>>>>>>> insertion. > >>>>>>>>>>>>> > >>>>>>>>>>>>> ligate > >>>>>>>>>>>>> ====== > >>>>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two > >>>>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. > >>>>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and > >>>>>>>>>>>>> insert in > >>>>>>>>>>>>> turn. > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> One situation I haven't handled yet is a deletion that spans the > >>>>>>>>>>>>> origin > >>>>>>>>>>>>> of a circular molecule but that should be a rare thing to do > >>>>>>>>>>>>> anyway. The > >>>>>>>>>>>>> code currently throws an error if this is attempted. > >>>>>>>>>>>>> > >>>>>>>>>>>>> I'm happy to contribute the code on Github if there is interest? > >>>>>>>>>>>>> Comments on the handling of feature locations highly welcome! > >>>>>>>>>>>>> > >>>>>>>>>>>>> Frank > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>> > >>> > >>> > >> > >> > >> > > > > > > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Wed Jan 18 12:24:50 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 18 Jan 2012 17:24:50 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> >From Bio::RangeI: "The behaviour of a range is undefined if ranges with negative numbers or zero are used." This is left ambiguous b/c the implementation may define this more specifically, but SeqFeatures AFAIK do not clarify this any more that Bio::RangeI does, so I don't think you can rely on any particular consistent behavior. Re: why this is so, the ambiguities pertain to how length, contains, overlaps, etc are calculated (these all assume positive 1-based coords). For example, since we use 1-based coords, do we include a 0 position with negative coordinates, or do negative coordinates start at -1? The current interface and implementations all adhere to how locations are defined in the DDBJ/EMBL/GenBank feature table definition, hence Roy's question. chris On Jan 18, 2012, at 11:08 AM, Frank Schwach wrote: > Thanks Roy! > I have pushed that to master now and it's part of my pending pull > request. > I just wanted to ask you because this is code you had written and there > might be a good reason not to do this. > Yes, you can create a SeqFeature object with negative coordinates. > > Cheers, > > Frank > > > On Wed, 2012-01-18 at 16:49 +0000, Roy Chaudhuri wrote: >> Ok, if the tests pass and it's useful then it's fine by me (not that you >> need my permission, of course). Just out of interest, do negative >> feature coordinates work with the rest of BioPerl? I don't think they >> are covered in the DDBJ/EMBL/GenBank feature table definition. >> >> Roy. >> >> On 18/01/2012 16:21, Frank Schwach wrote: >>> Hi Roy, >>> >>> I have a use-case for Bio::SeqUtils truncating a feature with negative >>> start location. In my case, this is happening because I transform >>> genomic coordinates of a feature to the coordinate frame of another >>> feature, so feature A that starts 10nt before the reference feature now >>> has a start of -10. I want to trim that to a fuzzy start = "<1" >>> >>> Bio::SeqUtils::_coord_adjust can be used to trim feature A accordingly >>> but we need to change the regex that manipulates the coordinates >>> slightly by adding an optional "-": >>> >>> map s/(\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add >>> +$1>$length) {">$length"} else {$add+$1}/ge, @coords; >>> >>> becomes >>> >>> map s/(-?\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add >>> +$1>$length) {">$length"} else {$add+$1}/ge, @coords; >>> >>> It doesn't change anything else as far as I can tell and adds some more >>> flexibility to the method. If you are ok with it I can push this to my >>> queued pull request for Bio::SeqUtils. >>> >>> Frank >>> >>> >>> >>> >>> >>> On Thu, 2012-01-12 at 14:13 +0000, Frank Schwach wrote: >>>> I have now created a version that gives the option to create the >>>> products of 'delete' and 'insert' via Bio::Root::Root:clone instead of >>>> calling 'new' on the input seq object class. Seems to be working fine >>>> for me so far. >>>> >>>> 'delete' and 'insert' can now take a hashref of options. >>>> The only option so far is to set 'clone_obj to true, to use cloning >>>> instead of creating objects via 'new'. >>>> Setting this parameter to false or not supplying the options hashref at >>>> all will give you the old behaviour (call 'new'). >>>> Example: >>>> >>>> my $product = Bio::SeqUtils->delete( >>>> $seq_obj, >>>> 11, >>>> 20, >>>> { clone_obj => 1} >>>> ); >>>> >>>> The ligate method takes clone_obj as a named parameter: >>>> >>>> my $new_molecule = Bio::Sequtils::Pbrtools->ligate( >>>> -recipient => $vector, >>>> -fragment => $fragment, >>>> -left => 1000, >>>> -right => 1100, >>>> -flip => 1, >>>> -clone_obj => 1 >>>> ); >>>> >>>> This is in a branch of my GitHub repo if you would like to have a look: >>>> >>>> https://github.com/fschwach/bioperl-live/tree/sequtils_clone >>>> >>>> Unfortunately, I can't add this option to trunc_with_features because >>>> the creation of the new object is delegated to 'trunc'. I guess I could >>>> implement 'trunc' in Bio::SeqUtils itself(?) >>>> >>>> What do you think, could this be merged into bioperl-live? >>>> >>>> >>>> Frank >>>> >>>> >>>> >>>> On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach wrote: >>>>> Great, I'll work on a branch that gives the user the option to use clone >>>>> instead of new and then we can see if we want to use that in the end. In >>>>> the meantime, what do you think about pulling this into bioperl-live? >>>>> When I have some time again I can work on the HOWTO for these new >>>>> features for the BioPerl wiki >>>>> >>>>> Frank >>>>> >>>>> >>>>> On 11/01/12 18:42, Fields, Christopher J wrote: >>>>>> Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). >>>>>> >>>>>> chris >>>>>> >>>>>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: >>>>>> >>>>>>> Hi Frank, >>>>>>> >>>>>>> Looks great, I like the use of between locations, didn't think of that. >>>>>>> >>>>>>> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? >>>>>>> >>>>>>> Cheers, >>>>>>> Roy. >>>>>>> >>>>>>> On 11/01/2012 18:16, Frank Schwach wrote: >>>>>>>> Hi Roy and Chris, >>>>>>>> >>>>>>>> I have made the changes to the code now. As you suggested, feature ends >>>>>>>> no longer change type and I insert a note instead to inform about the >>>>>>>> deletion (or insertion), showing the length and position. >>>>>>>> I have also added a feature to annotate deletion sites themselves (with >>>>>>>> IN-BETWEEN locations). >>>>>>>> >>>>>>>> Roy's test script now prints: >>>>>>>> >>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>>> ACCESSION unknown >>>>>>>> FEATURES Location/Qualifiers >>>>>>>> CDS join(2..3,4..6) >>>>>>>> /note="3bp internal deletion between pos 3 and 4" >>>>>>>> CDS 2..3 >>>>>>>> /note="2bp deleted from feature end" >>>>>>>> misc_feature 3^4 >>>>>>>> /note="deletion of 3bp" >>>>>>>> ORIGIN >>>>>>>> 1 aaaaaaa >>>>>>>> // >>>>>>>> >>>>>>>> >>>>>>>> or, if you add strand information (-1 in this case) to the second feature: >>>>>>>> >>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>>> ACCESSION unknown >>>>>>>> FEATURES Location/Qualifiers >>>>>>>> CDS join(2..3,4..6) >>>>>>>> /note="3bp internal deletion between pos 3 and 4" >>>>>>>> CDS complement(2..3) >>>>>>>> /note="2bp deleted from feature 5' end" >>>>>>>> misc_feature 3^4 >>>>>>>> /note="deletion of 3bp" >>>>>>>> ORIGIN >>>>>>>> 1 aaaaaaa >>>>>>>> // >>>>>>>> >>>>>>>> I have comitted this along with some bugfixes to my master branch on GitHub >>>>>>>> https://github.com/fschwach/bioperl-live >>>>>>>> so it's now also in my existing pull request. >>>>>>>> >>>>>>>> I'm still wondering if cloning the sequence objects rather than calling >>>>>>>> 'new' on their respective classes would be an option inside 'delete' and >>>>>>>> 'insert'? >>>>>>>> I'm experimenting with this for my own purposes because I have to work >>>>>>>> with custom sub-classes of Bio::Seq which have additional attributes and >>>>>>>> therefore set 'can_call_new' to false. >>>>>>>> Without cloning the objects, I first have to convert the custom >>>>>>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. >>>>>>>> Is there any reason why something like Clone::Fast should not be used in >>>>>>>> this case? It seems to work for me but there may be situations where >>>>>>>> this is going to blow up which I am not aware of. >>>>>>>> Cloning rather than calling new could be made an option in >>>>>>>> Bio::SeqUtils. I have most of the code for that already. >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 10/01/12 17:31, Roy Chaudhuri wrote: >>>>>>>>> Or without the typo: >>>>>>>>> >>>>>>>>> CDS join(2..3,4..6) >>>>>>>>> /note="3 bp internal deletion" >>>>>>>>> CDS 2..3 >>>>>>>>> /note="2 bp deleted from 3' end" >>>>>>>>> >>>>>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: >>>>>>>>>> I think it's me that didn't explain very well - I was talking about >>>>>>>>>> overlapping (rather than spanning) a deletion, although I think the same >>>>>>>>>> principle applies to the spanning example you gave. Here's some test >>>>>>>>>> code: >>>>>>>>>> >>>>>>>>>> #!/usr/bin/perl >>>>>>>>>> use warnings FATAL=>qw(all); >>>>>>>>>> use strict; >>>>>>>>>> use Bio::Seq; >>>>>>>>>> use Bio::SeqIO; >>>>>>>>>> use Bio::SeqUtils; >>>>>>>>>> use Bio::SeqFeature::Generic; >>>>>>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>>>>> -start=>2, >>>>>>>>>> -end=>9)); >>>>>>>>>> >>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>>>>> -start=>2, >>>>>>>>>> -end=>5)); >>>>>>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >>>>>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >>>>>>>>>> print $out $trunc; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> This currently outputs: >>>>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>>>>> ACCESSION unknown >>>>>>>>>> FEATURES Location/Qualifiers >>>>>>>>>> CDS join(2..>3,<4..6) >>>>>>>>>> CDS 2..>3 >>>>>>>>>> ORIGIN >>>>>>>>>> 1 aaaaaaa >>>>>>>>>> // >>>>>>>>>> >>>>>>>>>> However, I was suggesting that the feature table should be something >>>>>>>>>> like: >>>>>>>>>> CDS join(2..3,4..6) >>>>>>>>>> /note="3 bp internal deletion" >>>>>>>>>> CDS join(2..3) >>>>>>>>>> /note="2 bp deleted from 3' end" >>>>>>>>>> >>>>>>>>>> Fuzzy locations are intended to represent features which have boundaries >>>>>>>>>> spanning outside of the sequence. For a defined deletion that's not the >>>>>>>>>> case, the boundaries of the feature aren't unknown, they have been >>>>>>>>>> specifically altered. >>>>>>>>>> >>>>>>>>>> Hope this is clearer. >>>>>>>>>> Cheers, >>>>>>>>>> Roy. >>>>>>>>>> >>>>>>>>>> On 10/01/2012 16:47, Frank Schwach wrote: >>>>>>>>>>> Hi Roy, >>>>>>>>>>> >>>>>>>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries >>>>>>>>>>> of the feature that become fuzzy but the "inner" ones of the split >>>>>>>>>>> locations: >>>>>>>>>>> >>>>>>>>>>> -------------------- a feature's location >>>>>>>>>>> ==========xxxx================= sequence >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> --------- sublocation 1 >>>>>>>>>>> -------- sublocation 2 >>>>>>>>>>> =============================== >>>>>>>>>>> >>>>>>>>>>> x= sequence to delete >>>>>>>>>>> The feature's location has changed from Simple to Split. >>>>>>>>>>> >>>>>>>>>>> Sublocation 1: >>>>>>>>>>> start is still EXACT and has not changed >>>>>>>>>>> end is now AFTER because this is not a true end of the feature >>>>>>>>>>> >>>>>>>>>>> Sublocation 2: >>>>>>>>>>> start is BEFORE >>>>>>>>>>> end is EXACT (but shifted) >>>>>>>>>>> >>>>>>>>>>> I hope this makes more sense(?) >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>>>>>>>> Frank >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>>>>>>>>>> Hi Frank, >>>>>>>>>>>> >>>>>>>>>>>> Looks good to me. One thing I'm not sure about - why do features >>>>>>>>>>>> overlapping a deletion become fuzzy? That behaviour is in >>>>>>>>>>>> trunc_with_features because it's intended to represent a taking a >>>>>>>>>>>> subregion of a larger sequence, but if you're representing an internal >>>>>>>>>>>> deletion then the boundaries of the overlapping feature aren't >>>>>>>>>>>> unknown, >>>>>>>>>>>> they have been specifically altered. Maybe you could give absolute >>>>>>>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been >>>>>>>>>>>> truncated by however many bases. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Roy. >>>>>>>>>>>> >>>>>>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>> >>>>>>>>>>>>> I have made the changes in a Git fork and made the pull request now. >>>>>>>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>>>>>>>>>>> HOWTO for the BioPerl wiki. >>>>>>>>>>>>> >>>>>>>>>>>>> Frank >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>>>>>>>>>> Sounds very promising! The easiest way to contribute is via a >>>>>>>>>>>>>> fork of the code on Github with a pull request (as you already >>>>>>>>>>>>>> know, being a contributor to the Primer3 modules). >>>>>>>>>>>>>> >>>>>>>>>>>>>> chris >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>>>>>>>>>>> sequence >>>>>>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a >>>>>>>>>>>>>>> vector >>>>>>>>>>>>>>> and insert a fragment into it while preserving all the >>>>>>>>>>>>>>> annotations and >>>>>>>>>>>>>>> moving the features accordingly. >>>>>>>>>>>>>>> My main aim was to split features that span deletion/insertion >>>>>>>>>>>>>>> sites in >>>>>>>>>>>>>>> a meaningful way, which can not be done with the currently availble >>>>>>>>>>>>>>> methods. >>>>>>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>>>>>>>>>>> methods: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> delete >>>>>>>>>>>>>>> ====== >>>>>>>>>>>>>>> removes a segment from a sequence object and adjusts positions >>>>>>>>>>>>>>> and types >>>>>>>>>>>>>>> of locations of sequence features: >>>>>>>>>>>>>>> - locations of features that span the deletion sites are turned >>>>>>>>>>>>>>> into >>>>>>>>>>>>>>> Splits. >>>>>>>>>>>>>>> - locations that extend into the deleted region are turned to >>>>>>>>>>>>>>> Fuzzy to >>>>>>>>>>>>>>> indicate that their true start/end was lost. >>>>>>>>>>>>>>> - locations contained inside the deleted regions are lost. >>>>>>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>>>>>> deletion. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> insert >>>>>>>>>>>>>>> ====== >>>>>>>>>>>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>>>>>>>>>>> sites. This also affects the features on the recipient sequence: >>>>>>>>>>>>>>> - locations of features that span the insertion site are split but >>>>>>>>>>>>>>> position types are not turned to Fuzzy because no part of the >>>>>>>>>>>>>>> original >>>>>>>>>>>>>>> feature is lost. >>>>>>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>>>>>> insertion. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ligate >>>>>>>>>>>>>>> ====== >>>>>>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>>>>>>>>>>> insert in >>>>>>>>>>>>>>> turn. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> One situation I haven't handled yet is a deletion that spans the >>>>>>>>>>>>>>> origin >>>>>>>>>>>>>>> of a circular molecule but that should be a rare thing to do >>>>>>>>>>>>>>> anyway. The >>>>>>>>>>>>>>> code currently throws an error if this is attempted. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>>>>>>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Frank >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >>> >> > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fs5 at sanger.ac.uk Wed Jan 18 12:46:13 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 18 Jan 2012 17:46:13 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> Message-ID: <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> a yes, that's true. Still shoud be ok with _coord_adjust because it trims everything <1 to a "<1" fuzzy location, so whether or not 0 is a location doesn't matter in this case. But if you want me to revert this change then I am happy to do that. Frank On Wed, 2012-01-18 at 17:24 +0000, Fields, Christopher J wrote: > >From Bio::RangeI: "The behaviour of a range is undefined if ranges with negative numbers or zero are used." This is left ambiguous b/c the implementation may define this more specifically, but SeqFeatures AFAIK do not clarify this any more that Bio::RangeI does, so I don't think you can rely on any particular consistent behavior. > > Re: why this is so, the ambiguities pertain to how length, contains, overlaps, etc are calculated (these all assume positive 1-based coords). For example, since we use 1-based coords, do we include a 0 position with negative coordinates, or do negative coordinates start at -1? The current interface and implementations all adhere to how locations are defined in the DDBJ/EMBL/GenBank feature table definition, hence Roy's question. > > chris > > On Jan 18, 2012, at 11:08 AM, Frank Schwach wrote: > > > Thanks Roy! > > I have pushed that to master now and it's part of my pending pull > > request. > > I just wanted to ask you because this is code you had written and there > > might be a good reason not to do this. > > Yes, you can create a SeqFeature object with negative coordinates. > > > > Cheers, > > > > Frank > > > > > > On Wed, 2012-01-18 at 16:49 +0000, Roy Chaudhuri wrote: > >> Ok, if the tests pass and it's useful then it's fine by me (not that you > >> need my permission, of course). Just out of interest, do negative > >> feature coordinates work with the rest of BioPerl? I don't think they > >> are covered in the DDBJ/EMBL/GenBank feature table definition. > >> > >> Roy. > >> > >> On 18/01/2012 16:21, Frank Schwach wrote: > >>> Hi Roy, > >>> > >>> I have a use-case for Bio::SeqUtils truncating a feature with negative > >>> start location. In my case, this is happening because I transform > >>> genomic coordinates of a feature to the coordinate frame of another > >>> feature, so feature A that starts 10nt before the reference feature now > >>> has a start of -10. I want to trim that to a fuzzy start = "<1" > >>> > >>> Bio::SeqUtils::_coord_adjust can be used to trim feature A accordingly > >>> but we need to change the regex that manipulates the coordinates > >>> slightly by adding an optional "-": > >>> > >>> map s/(\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add > >>> +$1>$length) {">$length"} else {$add+$1}/ge, @coords; > >>> > >>> becomes > >>> > >>> map s/(-?\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add > >>> +$1>$length) {">$length"} else {$add+$1}/ge, @coords; > >>> > >>> It doesn't change anything else as far as I can tell and adds some more > >>> flexibility to the method. If you are ok with it I can push this to my > >>> queued pull request for Bio::SeqUtils. > >>> > >>> Frank > >>> > >>> > >>> > >>> > >>> > >>> On Thu, 2012-01-12 at 14:13 +0000, Frank Schwach wrote: > >>>> I have now created a version that gives the option to create the > >>>> products of 'delete' and 'insert' via Bio::Root::Root:clone instead of > >>>> calling 'new' on the input seq object class. Seems to be working fine > >>>> for me so far. > >>>> > >>>> 'delete' and 'insert' can now take a hashref of options. > >>>> The only option so far is to set 'clone_obj to true, to use cloning > >>>> instead of creating objects via 'new'. > >>>> Setting this parameter to false or not supplying the options hashref at > >>>> all will give you the old behaviour (call 'new'). > >>>> Example: > >>>> > >>>> my $product = Bio::SeqUtils->delete( > >>>> $seq_obj, > >>>> 11, > >>>> 20, > >>>> { clone_obj => 1} > >>>> ); > >>>> > >>>> The ligate method takes clone_obj as a named parameter: > >>>> > >>>> my $new_molecule = Bio::Sequtils::Pbrtools->ligate( > >>>> -recipient => $vector, > >>>> -fragment => $fragment, > >>>> -left => 1000, > >>>> -right => 1100, > >>>> -flip => 1, > >>>> -clone_obj => 1 > >>>> ); > >>>> > >>>> This is in a branch of my GitHub repo if you would like to have a look: > >>>> > >>>> https://github.com/fschwach/bioperl-live/tree/sequtils_clone > >>>> > >>>> Unfortunately, I can't add this option to trunc_with_features because > >>>> the creation of the new object is delegated to 'trunc'. I guess I could > >>>> implement 'trunc' in Bio::SeqUtils itself(?) > >>>> > >>>> What do you think, could this be merged into bioperl-live? > >>>> > >>>> > >>>> Frank > >>>> > >>>> > >>>> > >>>> On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach wrote: > >>>>> Great, I'll work on a branch that gives the user the option to use clone > >>>>> instead of new and then we can see if we want to use that in the end. In > >>>>> the meantime, what do you think about pulling this into bioperl-live? > >>>>> When I have some time again I can work on the HOWTO for these new > >>>>> features for the BioPerl wiki > >>>>> > >>>>> Frank > >>>>> > >>>>> > >>>>> On 11/01/12 18:42, Fields, Christopher J wrote: > >>>>>> Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). > >>>>>> > >>>>>> chris > >>>>>> > >>>>>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: > >>>>>> > >>>>>>> Hi Frank, > >>>>>>> > >>>>>>> Looks great, I like the use of between locations, didn't think of that. > >>>>>>> > >>>>>>> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? > >>>>>>> > >>>>>>> Cheers, > >>>>>>> Roy. > >>>>>>> > >>>>>>> On 11/01/2012 18:16, Frank Schwach wrote: > >>>>>>>> Hi Roy and Chris, > >>>>>>>> > >>>>>>>> I have made the changes to the code now. As you suggested, feature ends > >>>>>>>> no longer change type and I insert a note instead to inform about the > >>>>>>>> deletion (or insertion), showing the length and position. > >>>>>>>> I have also added a feature to annotate deletion sites themselves (with > >>>>>>>> IN-BETWEEN locations). > >>>>>>>> > >>>>>>>> Roy's test script now prints: > >>>>>>>> > >>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK > >>>>>>>> ACCESSION unknown > >>>>>>>> FEATURES Location/Qualifiers > >>>>>>>> CDS join(2..3,4..6) > >>>>>>>> /note="3bp internal deletion between pos 3 and 4" > >>>>>>>> CDS 2..3 > >>>>>>>> /note="2bp deleted from feature end" > >>>>>>>> misc_feature 3^4 > >>>>>>>> /note="deletion of 3bp" > >>>>>>>> ORIGIN > >>>>>>>> 1 aaaaaaa > >>>>>>>> // > >>>>>>>> > >>>>>>>> > >>>>>>>> or, if you add strand information (-1 in this case) to the second feature: > >>>>>>>> > >>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK > >>>>>>>> ACCESSION unknown > >>>>>>>> FEATURES Location/Qualifiers > >>>>>>>> CDS join(2..3,4..6) > >>>>>>>> /note="3bp internal deletion between pos 3 and 4" > >>>>>>>> CDS complement(2..3) > >>>>>>>> /note="2bp deleted from feature 5' end" > >>>>>>>> misc_feature 3^4 > >>>>>>>> /note="deletion of 3bp" > >>>>>>>> ORIGIN > >>>>>>>> 1 aaaaaaa > >>>>>>>> // > >>>>>>>> > >>>>>>>> I have comitted this along with some bugfixes to my master branch on GitHub > >>>>>>>> https://github.com/fschwach/bioperl-live > >>>>>>>> so it's now also in my existing pull request. > >>>>>>>> > >>>>>>>> I'm still wondering if cloning the sequence objects rather than calling > >>>>>>>> 'new' on their respective classes would be an option inside 'delete' and > >>>>>>>> 'insert'? > >>>>>>>> I'm experimenting with this for my own purposes because I have to work > >>>>>>>> with custom sub-classes of Bio::Seq which have additional attributes and > >>>>>>>> therefore set 'can_call_new' to false. > >>>>>>>> Without cloning the objects, I first have to convert the custom > >>>>>>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. > >>>>>>>> Is there any reason why something like Clone::Fast should not be used in > >>>>>>>> this case? It seems to work for me but there may be situations where > >>>>>>>> this is going to blow up which I am not aware of. > >>>>>>>> Cloning rather than calling new could be made an option in > >>>>>>>> Bio::SeqUtils. I have most of the code for that already. > >>>>>>>> > >>>>>>>> Frank > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 10/01/12 17:31, Roy Chaudhuri wrote: > >>>>>>>>> Or without the typo: > >>>>>>>>> > >>>>>>>>> CDS join(2..3,4..6) > >>>>>>>>> /note="3 bp internal deletion" > >>>>>>>>> CDS 2..3 > >>>>>>>>> /note="2 bp deleted from 3' end" > >>>>>>>>> > >>>>>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: > >>>>>>>>>> I think it's me that didn't explain very well - I was talking about > >>>>>>>>>> overlapping (rather than spanning) a deletion, although I think the same > >>>>>>>>>> principle applies to the spanning example you gave. Here's some test > >>>>>>>>>> code: > >>>>>>>>>> > >>>>>>>>>> #!/usr/bin/perl > >>>>>>>>>> use warnings FATAL=>qw(all); > >>>>>>>>>> use strict; > >>>>>>>>>> use Bio::Seq; > >>>>>>>>>> use Bio::SeqIO; > >>>>>>>>>> use Bio::SeqUtils; > >>>>>>>>>> use Bio::SeqFeature::Generic; > >>>>>>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); > >>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > >>>>>>>>>> -start=>2, > >>>>>>>>>> -end=>9)); > >>>>>>>>>> > >>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', > >>>>>>>>>> -start=>2, > >>>>>>>>>> -end=>5)); > >>>>>>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); > >>>>>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); > >>>>>>>>>> print $out $trunc; > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> This currently outputs: > >>>>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK > >>>>>>>>>> ACCESSION unknown > >>>>>>>>>> FEATURES Location/Qualifiers > >>>>>>>>>> CDS join(2..>3,<4..6) > >>>>>>>>>> CDS 2..>3 > >>>>>>>>>> ORIGIN > >>>>>>>>>> 1 aaaaaaa > >>>>>>>>>> // > >>>>>>>>>> > >>>>>>>>>> However, I was suggesting that the feature table should be something > >>>>>>>>>> like: > >>>>>>>>>> CDS join(2..3,4..6) > >>>>>>>>>> /note="3 bp internal deletion" > >>>>>>>>>> CDS join(2..3) > >>>>>>>>>> /note="2 bp deleted from 3' end" > >>>>>>>>>> > >>>>>>>>>> Fuzzy locations are intended to represent features which have boundaries > >>>>>>>>>> spanning outside of the sequence. For a defined deletion that's not the > >>>>>>>>>> case, the boundaries of the feature aren't unknown, they have been > >>>>>>>>>> specifically altered. > >>>>>>>>>> > >>>>>>>>>> Hope this is clearer. > >>>>>>>>>> Cheers, > >>>>>>>>>> Roy. > >>>>>>>>>> > >>>>>>>>>> On 10/01/2012 16:47, Frank Schwach wrote: > >>>>>>>>>>> Hi Roy, > >>>>>>>>>>> > >>>>>>>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries > >>>>>>>>>>> of the feature that become fuzzy but the "inner" ones of the split > >>>>>>>>>>> locations: > >>>>>>>>>>> > >>>>>>>>>>> -------------------- a feature's location > >>>>>>>>>>> ==========xxxx================= sequence > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> --------- sublocation 1 > >>>>>>>>>>> -------- sublocation 2 > >>>>>>>>>>> =============================== > >>>>>>>>>>> > >>>>>>>>>>> x= sequence to delete > >>>>>>>>>>> The feature's location has changed from Simple to Split. > >>>>>>>>>>> > >>>>>>>>>>> Sublocation 1: > >>>>>>>>>>> start is still EXACT and has not changed > >>>>>>>>>>> end is now AFTER because this is not a true end of the feature > >>>>>>>>>>> > >>>>>>>>>>> Sublocation 2: > >>>>>>>>>>> start is BEFORE > >>>>>>>>>>> end is EXACT (but shifted) > >>>>>>>>>>> > >>>>>>>>>>> I hope this makes more sense(?) > >>>>>>>>>>> > >>>>>>>>>>> Cheers, > >>>>>>>>>>> > >>>>>>>>>>> Frank > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: > >>>>>>>>>>>> Hi Frank, > >>>>>>>>>>>> > >>>>>>>>>>>> Looks good to me. One thing I'm not sure about - why do features > >>>>>>>>>>>> overlapping a deletion become fuzzy? That behaviour is in > >>>>>>>>>>>> trunc_with_features because it's intended to represent a taking a > >>>>>>>>>>>> subregion of a larger sequence, but if you're representing an internal > >>>>>>>>>>>> deletion then the boundaries of the overlapping feature aren't > >>>>>>>>>>>> unknown, > >>>>>>>>>>>> they have been specifically altered. Maybe you could give absolute > >>>>>>>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been > >>>>>>>>>>>> truncated by however many bases. > >>>>>>>>>>>> > >>>>>>>>>>>> Cheers, > >>>>>>>>>>>> Roy. > >>>>>>>>>>>> > >>>>>>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: > >>>>>>>>>>>>> Hi Chris, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I have made the changes in a Git fork and made the pull request now. > >>>>>>>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils > >>>>>>>>>>>>> HOWTO for the BioPerl wiki. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Frank > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: > >>>>>>>>>>>>>> Sounds very promising! The easiest way to contribute is via a > >>>>>>>>>>>>>> fork of the code on Github with a pull request (as you already > >>>>>>>>>>>>>> know, being a contributor to the Primer3 modules). > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> chris > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi all, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and > >>>>>>>>>>>>>>> sequence > >>>>>>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a > >>>>>>>>>>>>>>> vector > >>>>>>>>>>>>>>> and insert a fragment into it while preserving all the > >>>>>>>>>>>>>>> annotations and > >>>>>>>>>>>>>>> moving the features accordingly. > >>>>>>>>>>>>>>> My main aim was to split features that span deletion/insertion > >>>>>>>>>>>>>>> sites in > >>>>>>>>>>>>>>> a meaningful way, which can not be done with the currently availble > >>>>>>>>>>>>>>> methods. > >>>>>>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new > >>>>>>>>>>>>>>> methods: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> delete > >>>>>>>>>>>>>>> ====== > >>>>>>>>>>>>>>> removes a segment from a sequence object and adjusts positions > >>>>>>>>>>>>>>> and types > >>>>>>>>>>>>>>> of locations of sequence features: > >>>>>>>>>>>>>>> - locations of features that span the deletion sites are turned > >>>>>>>>>>>>>>> into > >>>>>>>>>>>>>>> Splits. > >>>>>>>>>>>>>>> - locations that extend into the deleted region are turned to > >>>>>>>>>>>>>>> Fuzzy to > >>>>>>>>>>>>>>> indicate that their true start/end was lost. > >>>>>>>>>>>>>>> - locations contained inside the deleted regions are lost. > >>>>>>>>>>>>>>> - other features are shifted according to the length of the > >>>>>>>>>>>>>>> deletion. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> insert > >>>>>>>>>>>>>>> ====== > >>>>>>>>>>>>>>> adds a Bio::Seq object into another one between specified insertion > >>>>>>>>>>>>>>> sites. This also affects the features on the recipient sequence: > >>>>>>>>>>>>>>> - locations of features that span the insertion site are split but > >>>>>>>>>>>>>>> position types are not turned to Fuzzy because no part of the > >>>>>>>>>>>>>>> original > >>>>>>>>>>>>>>> feature is lost. > >>>>>>>>>>>>>>> - other features are shifted according to the length of the > >>>>>>>>>>>>>>> insertion. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> ligate > >>>>>>>>>>>>>>> ====== > >>>>>>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two > >>>>>>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. > >>>>>>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and > >>>>>>>>>>>>>>> insert in > >>>>>>>>>>>>>>> turn. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> One situation I haven't handled yet is a deletion that spans the > >>>>>>>>>>>>>>> origin > >>>>>>>>>>>>>>> of a circular molecule but that should be a rare thing to do > >>>>>>>>>>>>>>> anyway. The > >>>>>>>>>>>>>>> code currently throws an error if this is attempted. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I'm happy to contribute the code on Github if there is interest? > >>>>>>>>>>>>>>> Comments on the handling of feature locations highly welcome! > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Frank > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>> > >>> > >>> > >> > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From cjfields at illinois.edu Wed Jan 18 13:11:32 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 18 Jan 2012 18:11:32 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: You could probably change "<1" to "<" if the start isn't meant to be defined; the former seems to imply the feature location is '1 or before the start', the latter is simply 'before the start'. Either version should work within bioperl AFAIK, though I'm not sure whether the feature table definition covers ' a yes, that's true. Still shoud be ok with _coord_adjust because it > trims everything <1 to a "<1" fuzzy location, so whether or not 0 is a > location doesn't matter in this case. But if you want me to revert this > change then I am happy to do that. > > Frank > > > On Wed, 2012-01-18 at 17:24 +0000, Fields, Christopher J wrote: >>> From Bio::RangeI: "The behaviour of a range is undefined if ranges with negative numbers or zero are used." This is left ambiguous b/c the implementation may define this more specifically, but SeqFeatures AFAIK do not clarify this any more that Bio::RangeI does, so I don't think you can rely on any particular consistent behavior. >> >> Re: why this is so, the ambiguities pertain to how length, contains, overlaps, etc are calculated (these all assume positive 1-based coords). For example, since we use 1-based coords, do we include a 0 position with negative coordinates, or do negative coordinates start at -1? The current interface and implementations all adhere to how locations are defined in the DDBJ/EMBL/GenBank feature table definition, hence Roy's question. >> >> chris >> >> On Jan 18, 2012, at 11:08 AM, Frank Schwach wrote: >> >>> Thanks Roy! >>> I have pushed that to master now and it's part of my pending pull >>> request. >>> I just wanted to ask you because this is code you had written and there >>> might be a good reason not to do this. >>> Yes, you can create a SeqFeature object with negative coordinates. >>> >>> Cheers, >>> >>> Frank >>> >>> >>> On Wed, 2012-01-18 at 16:49 +0000, Roy Chaudhuri wrote: >>>> Ok, if the tests pass and it's useful then it's fine by me (not that you >>>> need my permission, of course). Just out of interest, do negative >>>> feature coordinates work with the rest of BioPerl? I don't think they >>>> are covered in the DDBJ/EMBL/GenBank feature table definition. >>>> >>>> Roy. >>>> >>>> On 18/01/2012 16:21, Frank Schwach wrote: >>>>> Hi Roy, >>>>> >>>>> I have a use-case for Bio::SeqUtils truncating a feature with negative >>>>> start location. In my case, this is happening because I transform >>>>> genomic coordinates of a feature to the coordinate frame of another >>>>> feature, so feature A that starts 10nt before the reference feature now >>>>> has a start of -10. I want to trim that to a fuzzy start = "<1" >>>>> >>>>> Bio::SeqUtils::_coord_adjust can be used to trim feature A accordingly >>>>> but we need to change the regex that manipulates the coordinates >>>>> slightly by adding an optional "-": >>>>> >>>>> map s/(\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add >>>>> +$1>$length) {">$length"} else {$add+$1}/ge, @coords; >>>>> >>>>> becomes >>>>> >>>>> map s/(-?\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add >>>>> +$1>$length) {">$length"} else {$add+$1}/ge, @coords; >>>>> >>>>> It doesn't change anything else as far as I can tell and adds some more >>>>> flexibility to the method. If you are ok with it I can push this to my >>>>> queued pull request for Bio::SeqUtils. >>>>> >>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Thu, 2012-01-12 at 14:13 +0000, Frank Schwach wrote: >>>>>> I have now created a version that gives the option to create the >>>>>> products of 'delete' and 'insert' via Bio::Root::Root:clone instead of >>>>>> calling 'new' on the input seq object class. Seems to be working fine >>>>>> for me so far. >>>>>> >>>>>> 'delete' and 'insert' can now take a hashref of options. >>>>>> The only option so far is to set 'clone_obj to true, to use cloning >>>>>> instead of creating objects via 'new'. >>>>>> Setting this parameter to false or not supplying the options hashref at >>>>>> all will give you the old behaviour (call 'new'). >>>>>> Example: >>>>>> >>>>>> my $product = Bio::SeqUtils->delete( >>>>>> $seq_obj, >>>>>> 11, >>>>>> 20, >>>>>> { clone_obj => 1} >>>>>> ); >>>>>> >>>>>> The ligate method takes clone_obj as a named parameter: >>>>>> >>>>>> my $new_molecule = Bio::Sequtils::Pbrtools->ligate( >>>>>> -recipient => $vector, >>>>>> -fragment => $fragment, >>>>>> -left => 1000, >>>>>> -right => 1100, >>>>>> -flip => 1, >>>>>> -clone_obj => 1 >>>>>> ); >>>>>> >>>>>> This is in a branch of my GitHub repo if you would like to have a look: >>>>>> >>>>>> https://github.com/fschwach/bioperl-live/tree/sequtils_clone >>>>>> >>>>>> Unfortunately, I can't add this option to trunc_with_features because >>>>>> the creation of the new object is delegated to 'trunc'. I guess I could >>>>>> implement 'trunc' in Bio::SeqUtils itself(?) >>>>>> >>>>>> What do you think, could this be merged into bioperl-live? >>>>>> >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach wrote: >>>>>>> Great, I'll work on a branch that gives the user the option to use clone >>>>>>> instead of new and then we can see if we want to use that in the end. In >>>>>>> the meantime, what do you think about pulling this into bioperl-live? >>>>>>> When I have some time again I can work on the HOWTO for these new >>>>>>> features for the BioPerl wiki >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On 11/01/12 18:42, Fields, Christopher J wrote: >>>>>>>> Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). >>>>>>>> >>>>>>>> chris >>>>>>>> >>>>>>>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: >>>>>>>> >>>>>>>>> Hi Frank, >>>>>>>>> >>>>>>>>> Looks great, I like the use of between locations, didn't think of that. >>>>>>>>> >>>>>>>>> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Roy. >>>>>>>>> >>>>>>>>> On 11/01/2012 18:16, Frank Schwach wrote: >>>>>>>>>> Hi Roy and Chris, >>>>>>>>>> >>>>>>>>>> I have made the changes to the code now. As you suggested, feature ends >>>>>>>>>> no longer change type and I insert a note instead to inform about the >>>>>>>>>> deletion (or insertion), showing the length and position. >>>>>>>>>> I have also added a feature to annotate deletion sites themselves (with >>>>>>>>>> IN-BETWEEN locations). >>>>>>>>>> >>>>>>>>>> Roy's test script now prints: >>>>>>>>>> >>>>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>>>>> ACCESSION unknown >>>>>>>>>> FEATURES Location/Qualifiers >>>>>>>>>> CDS join(2..3,4..6) >>>>>>>>>> /note="3bp internal deletion between pos 3 and 4" >>>>>>>>>> CDS 2..3 >>>>>>>>>> /note="2bp deleted from feature end" >>>>>>>>>> misc_feature 3^4 >>>>>>>>>> /note="deletion of 3bp" >>>>>>>>>> ORIGIN >>>>>>>>>> 1 aaaaaaa >>>>>>>>>> // >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> or, if you add strand information (-1 in this case) to the second feature: >>>>>>>>>> >>>>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>>>>> ACCESSION unknown >>>>>>>>>> FEATURES Location/Qualifiers >>>>>>>>>> CDS join(2..3,4..6) >>>>>>>>>> /note="3bp internal deletion between pos 3 and 4" >>>>>>>>>> CDS complement(2..3) >>>>>>>>>> /note="2bp deleted from feature 5' end" >>>>>>>>>> misc_feature 3^4 >>>>>>>>>> /note="deletion of 3bp" >>>>>>>>>> ORIGIN >>>>>>>>>> 1 aaaaaaa >>>>>>>>>> // >>>>>>>>>> >>>>>>>>>> I have comitted this along with some bugfixes to my master branch on GitHub >>>>>>>>>> https://github.com/fschwach/bioperl-live >>>>>>>>>> so it's now also in my existing pull request. >>>>>>>>>> >>>>>>>>>> I'm still wondering if cloning the sequence objects rather than calling >>>>>>>>>> 'new' on their respective classes would be an option inside 'delete' and >>>>>>>>>> 'insert'? >>>>>>>>>> I'm experimenting with this for my own purposes because I have to work >>>>>>>>>> with custom sub-classes of Bio::Seq which have additional attributes and >>>>>>>>>> therefore set 'can_call_new' to false. >>>>>>>>>> Without cloning the objects, I first have to convert the custom >>>>>>>>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. >>>>>>>>>> Is there any reason why something like Clone::Fast should not be used in >>>>>>>>>> this case? It seems to work for me but there may be situations where >>>>>>>>>> this is going to blow up which I am not aware of. >>>>>>>>>> Cloning rather than calling new could be made an option in >>>>>>>>>> Bio::SeqUtils. I have most of the code for that already. >>>>>>>>>> >>>>>>>>>> Frank >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 10/01/12 17:31, Roy Chaudhuri wrote: >>>>>>>>>>> Or without the typo: >>>>>>>>>>> >>>>>>>>>>> CDS join(2..3,4..6) >>>>>>>>>>> /note="3 bp internal deletion" >>>>>>>>>>> CDS 2..3 >>>>>>>>>>> /note="2 bp deleted from 3' end" >>>>>>>>>>> >>>>>>>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: >>>>>>>>>>>> I think it's me that didn't explain very well - I was talking about >>>>>>>>>>>> overlapping (rather than spanning) a deletion, although I think the same >>>>>>>>>>>> principle applies to the spanning example you gave. Here's some test >>>>>>>>>>>> code: >>>>>>>>>>>> >>>>>>>>>>>> #!/usr/bin/perl >>>>>>>>>>>> use warnings FATAL=>qw(all); >>>>>>>>>>>> use strict; >>>>>>>>>>>> use Bio::Seq; >>>>>>>>>>>> use Bio::SeqIO; >>>>>>>>>>>> use Bio::SeqUtils; >>>>>>>>>>>> use Bio::SeqFeature::Generic; >>>>>>>>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >>>>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>>>>>>> -start=>2, >>>>>>>>>>>> -end=>9)); >>>>>>>>>>>> >>>>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>>>>>>> -start=>2, >>>>>>>>>>>> -end=>5)); >>>>>>>>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >>>>>>>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >>>>>>>>>>>> print $out $trunc; >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> This currently outputs: >>>>>>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>>>>>>> ACCESSION unknown >>>>>>>>>>>> FEATURES Location/Qualifiers >>>>>>>>>>>> CDS join(2..>3,<4..6) >>>>>>>>>>>> CDS 2..>3 >>>>>>>>>>>> ORIGIN >>>>>>>>>>>> 1 aaaaaaa >>>>>>>>>>>> // >>>>>>>>>>>> >>>>>>>>>>>> However, I was suggesting that the feature table should be something >>>>>>>>>>>> like: >>>>>>>>>>>> CDS join(2..3,4..6) >>>>>>>>>>>> /note="3 bp internal deletion" >>>>>>>>>>>> CDS join(2..3) >>>>>>>>>>>> /note="2 bp deleted from 3' end" >>>>>>>>>>>> >>>>>>>>>>>> Fuzzy locations are intended to represent features which have boundaries >>>>>>>>>>>> spanning outside of the sequence. For a defined deletion that's not the >>>>>>>>>>>> case, the boundaries of the feature aren't unknown, they have been >>>>>>>>>>>> specifically altered. >>>>>>>>>>>> >>>>>>>>>>>> Hope this is clearer. >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Roy. >>>>>>>>>>>> >>>>>>>>>>>> On 10/01/2012 16:47, Frank Schwach wrote: >>>>>>>>>>>>> Hi Roy, >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries >>>>>>>>>>>>> of the feature that become fuzzy but the "inner" ones of the split >>>>>>>>>>>>> locations: >>>>>>>>>>>>> >>>>>>>>>>>>> -------------------- a feature's location >>>>>>>>>>>>> ==========xxxx================= sequence >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> --------- sublocation 1 >>>>>>>>>>>>> -------- sublocation 2 >>>>>>>>>>>>> =============================== >>>>>>>>>>>>> >>>>>>>>>>>>> x= sequence to delete >>>>>>>>>>>>> The feature's location has changed from Simple to Split. >>>>>>>>>>>>> >>>>>>>>>>>>> Sublocation 1: >>>>>>>>>>>>> start is still EXACT and has not changed >>>>>>>>>>>>> end is now AFTER because this is not a true end of the feature >>>>>>>>>>>>> >>>>>>>>>>>>> Sublocation 2: >>>>>>>>>>>>> start is BEFORE >>>>>>>>>>>>> end is EXACT (but shifted) >>>>>>>>>>>>> >>>>>>>>>>>>> I hope this makes more sense(?) >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> >>>>>>>>>>>>> Frank >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>>>>>>>>>>>> Hi Frank, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looks good to me. One thing I'm not sure about - why do features >>>>>>>>>>>>>> overlapping a deletion become fuzzy? That behaviour is in >>>>>>>>>>>>>> trunc_with_features because it's intended to represent a taking a >>>>>>>>>>>>>> subregion of a larger sequence, but if you're representing an internal >>>>>>>>>>>>>> deletion then the boundaries of the overlapping feature aren't >>>>>>>>>>>>>> unknown, >>>>>>>>>>>>>> they have been specifically altered. Maybe you could give absolute >>>>>>>>>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been >>>>>>>>>>>>>> truncated by however many bases. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Roy. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have made the changes in a Git fork and made the pull request now. >>>>>>>>>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>>>>>>>>>>>>> HOWTO for the BioPerl wiki. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Frank >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>>>>>>>>>>>> Sounds very promising! The easiest way to contribute is via a >>>>>>>>>>>>>>>> fork of the code on Github with a pull request (as you already >>>>>>>>>>>>>>>> know, being a contributor to the Primer3 modules). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> chris >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>>>>>>>>>>>>> sequence >>>>>>>>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a >>>>>>>>>>>>>>>>> vector >>>>>>>>>>>>>>>>> and insert a fragment into it while preserving all the >>>>>>>>>>>>>>>>> annotations and >>>>>>>>>>>>>>>>> moving the features accordingly. >>>>>>>>>>>>>>>>> My main aim was to split features that span deletion/insertion >>>>>>>>>>>>>>>>> sites in >>>>>>>>>>>>>>>>> a meaningful way, which can not be done with the currently availble >>>>>>>>>>>>>>>>> methods. >>>>>>>>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>>>>>>>>>>>>> methods: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> delete >>>>>>>>>>>>>>>>> ====== >>>>>>>>>>>>>>>>> removes a segment from a sequence object and adjusts positions >>>>>>>>>>>>>>>>> and types >>>>>>>>>>>>>>>>> of locations of sequence features: >>>>>>>>>>>>>>>>> - locations of features that span the deletion sites are turned >>>>>>>>>>>>>>>>> into >>>>>>>>>>>>>>>>> Splits. >>>>>>>>>>>>>>>>> - locations that extend into the deleted region are turned to >>>>>>>>>>>>>>>>> Fuzzy to >>>>>>>>>>>>>>>>> indicate that their true start/end was lost. >>>>>>>>>>>>>>>>> - locations contained inside the deleted regions are lost. >>>>>>>>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>>>>>>>> deletion. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> insert >>>>>>>>>>>>>>>>> ====== >>>>>>>>>>>>>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>>>>>>>>>>>>> sites. This also affects the features on the recipient sequence: >>>>>>>>>>>>>>>>> - locations of features that span the insertion site are split but >>>>>>>>>>>>>>>>> position types are not turned to Fuzzy because no part of the >>>>>>>>>>>>>>>>> original >>>>>>>>>>>>>>>>> feature is lost. >>>>>>>>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>>>>>>>> insertion. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ligate >>>>>>>>>>>>>>>>> ====== >>>>>>>>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>>>>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>>>>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>>>>>>>>>>>>> insert in >>>>>>>>>>>>>>>>> turn. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> One situation I haven't handled yet is a deletion that spans the >>>>>>>>>>>>>>>>> origin >>>>>>>>>>>>>>>>> of a circular molecule but that should be a rare thing to do >>>>>>>>>>>>>>>>> anyway. The >>>>>>>>>>>>>>>>> code currently throws an error if this is attempted. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>>>>>>>>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Frank >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, a charity registered in England with number 1021457 and a >>> company registered in England with number 2742969, whose registered >>> office is 215 Euston Road, London, NW1 2BE. >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. From bosborne11 at verizon.net Wed Jan 18 14:20:36 2012 From: bosborne11 at verizon.net (Brian Osborne) Date: Wed, 18 Jan 2012 14:20:36 -0500 Subject: [Bioperl-l] parse blast-xml output In-Reply-To: References: Message-ID: <44D10220-4A09-470E-BD83-EFBA3F1BA50C@verizon.net> http://www.bioperl.org/wiki/HOWTO:SearchIO#Sorting On Jan 18, 2012, at 5:57 AM, Jordi Durban wrote: > Hi all! > I'm trying to parse a xml blast output (-m 7 option) in order to get the > best hit (I mean the first one) from each result I got. > I've done: > * > my @files=<*>; > my @files2 = grep (/^454AllContigs.fna.masked*/, @files); > foreach my $blast_report(@files2){ > # Get the report > my $searchio = new Bio::SearchIO (-format => 'blastxml', > -file=>$blast_report, -best =>'true'); > while( my $result = $searchio->next_result ) { > > my $query=$result->query_name(); > #~ print @query,"\n"; ##### results quey names > while (my $hits = $result->next_hit) { > #~ print $hits,"\n"; ###### the whole of hits > my $name= $hits->name(); > my $desc = $hits->description(); > print $query."\t".$name."\t".$desc,"\n"; > > *But it does not work as I get the whole of results from a single query. > What I mean: > contig01181 gi|63794|emb|X03832.1| Chicken mRNA 3' end for fast > skeletal troponin I (sTnI) > contig01181 gi|110293358|gb|DQ646396.1| Lama pacos troponin 1 type 2 > (Tnni2) mRNA, partial cds > contig01181 gi|298897248|emb|FQ224489.1| Rattus norvegicus > TL0ACA64YG07 mRNA sequence > contig01181 gi|298892466|emb|FQ217985.1| Rattus norvegicus > TL0ACA12YG21 mRNA sequence > contig01181 gi|298889559|emb|FQ217454.1| Rattus norvegicus > TL0ACA25YO07 mRNA sequence > contig01181 gi|298888987|emb|FQ223772.1| Rattus norvegicus > TL0ACA87YD21 mRNA sequence > > I know some perl and I think it is a really newbie question but any help > would be appreciate. > Thanks a lot. > -- > Jordi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From p.j.a.cock at googlemail.com Thu Jan 19 06:04:06 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 19 Jan 2012 11:04:06 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: On Wed, Jan 18, 2012 at 6:11 PM, Fields, Christopher J wrote: > You could probably change "<1" to "<" if the start isn't meant to be defined; > the former seems to imply the feature location is '1 or before the start', the > latter is simply 'before the start'. ?Either version should work within bioperl > AFAIK, though I'm not sure whether the feature table definition covers ' as a location type. > > Just thinking aloud, but such behavior might be something that needs to > be defined more specifically in the Bio::RangeI implementation, via trunc(). > > chris Using '<1' or in general ' References: <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <4F17FB50.4000606@sanger.ac.uk> I'd prefer "<1" still but I don't insist on this feature if it is controversial. Either way, I think we should either allow handling negative positions or explicitly throw an error if a negative position is passed in. At the moment, the method will take a negative position but then do something unexpected with it as it treats it like a positive value. The way I handled the negative position assumes that "0" is to the left of "1", as you get when calculating relative positions to a 1-based location. I can add something to the POD to explain this. I was thinking of using this behaviour to add another method to SeqUtils to add seqFeatures with relative corodinates that may start/end outside of the Bio::Seq you want to annotate. Roy, Chris: what do you prefer? Frank Fields, Christopher J wrote: > You could probably change "<1" to "<" if the start isn't meant to be defined; the former seems to imply the feature location is '1 or before the start', the latter is simply 'before the start'. Either version should work within bioperl AFAIK, though I'm not sure whether the feature table definition covers ' > Just thinking aloud, but such behavior might be something that needs to be defined more specifically in the Bio::RangeI implementation, via trunc(). > > chris > > On Jan 18, 2012, at 11:46 AM, Frank Schwach wrote: > > >> a yes, that's true. Still shoud be ok with _coord_adjust because it >> trims everything <1 to a "<1" fuzzy location, so whether or not 0 is a >> location doesn't matter in this case. But if you want me to revert this >> change then I am happy to do that. >> >> Frank >> >> >> On Wed, 2012-01-18 at 17:24 +0000, Fields, Christopher J wrote: >> >>>> From Bio::RangeI: "The behaviour of a range is undefined if ranges with negative numbers or zero are used." This is left ambiguous b/c the implementation may define this more specifically, but SeqFeatures AFAIK do not clarify this any more that Bio::RangeI does, so I don't think you can rely on any particular consistent behavior. >>>> >>> Re: why this is so, the ambiguities pertain to how length, contains, overlaps, etc are calculated (these all assume positive 1-based coords). For example, since we use 1-based coords, do we include a 0 position with negative coordinates, or do negative coordinates start at -1? The current interface and implementations all adhere to how locations are defined in the DDBJ/EMBL/GenBank feature table definition, hence Roy's question. >>> >>> chris >>> >>> On Jan 18, 2012, at 11:08 AM, Frank Schwach wrote: >>> >>> >>>> Thanks Roy! >>>> I have pushed that to master now and it's part of my pending pull >>>> request. >>>> I just wanted to ask you because this is code you had written and there >>>> might be a good reason not to do this. >>>> Yes, you can create a SeqFeature object with negative coordinates. >>>> >>>> Cheers, >>>> >>>> Frank >>>> >>>> >>>> On Wed, 2012-01-18 at 16:49 +0000, Roy Chaudhuri wrote: >>>> >>>>> Ok, if the tests pass and it's useful then it's fine by me (not that you >>>>> need my permission, of course). Just out of interest, do negative >>>>> feature coordinates work with the rest of BioPerl? I don't think they >>>>> are covered in the DDBJ/EMBL/GenBank feature table definition. >>>>> >>>>> Roy. >>>>> >>>>> On 18/01/2012 16:21, Frank Schwach wrote: >>>>> >>>>>> Hi Roy, >>>>>> >>>>>> I have a use-case for Bio::SeqUtils truncating a feature with negative >>>>>> start location. In my case, this is happening because I transform >>>>>> genomic coordinates of a feature to the coordinate frame of another >>>>>> feature, so feature A that starts 10nt before the reference feature now >>>>>> has a start of -10. I want to trim that to a fuzzy start = "<1" >>>>>> >>>>>> Bio::SeqUtils::_coord_adjust can be used to trim feature A accordingly >>>>>> but we need to change the regex that manipulates the coordinates >>>>>> slightly by adding an optional "-": >>>>>> >>>>>> map s/(\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add >>>>>> +$1>$length) {">$length"} else {$add+$1}/ge, @coords; >>>>>> >>>>>> becomes >>>>>> >>>>>> map s/(-?\d+)/if ($add+$1<1) {'<1'} elsif (defined $length and $add >>>>>> +$1>$length) {">$length"} else {$add+$1}/ge, @coords; >>>>>> >>>>>> It doesn't change anything else as far as I can tell and adds some more >>>>>> flexibility to the method. If you are ok with it I can push this to my >>>>>> queued pull request for Bio::SeqUtils. >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, 2012-01-12 at 14:13 +0000, Frank Schwach wrote: >>>>>> >>>>>>> I have now created a version that gives the option to create the >>>>>>> products of 'delete' and 'insert' via Bio::Root::Root:clone instead of >>>>>>> calling 'new' on the input seq object class. Seems to be working fine >>>>>>> for me so far. >>>>>>> >>>>>>> 'delete' and 'insert' can now take a hashref of options. >>>>>>> The only option so far is to set 'clone_obj to true, to use cloning >>>>>>> instead of creating objects via 'new'. >>>>>>> Setting this parameter to false or not supplying the options hashref at >>>>>>> all will give you the old behaviour (call 'new'). >>>>>>> Example: >>>>>>> >>>>>>> my $product = Bio::SeqUtils->delete( >>>>>>> $seq_obj, >>>>>>> 11, >>>>>>> 20, >>>>>>> { clone_obj => 1} >>>>>>> ); >>>>>>> >>>>>>> The ligate method takes clone_obj as a named parameter: >>>>>>> >>>>>>> my $new_molecule = Bio::Sequtils::Pbrtools->ligate( >>>>>>> -recipient => $vector, >>>>>>> -fragment => $fragment, >>>>>>> -left => 1000, >>>>>>> -right => 1100, >>>>>>> -flip => 1, >>>>>>> -clone_obj => 1 >>>>>>> ); >>>>>>> >>>>>>> This is in a branch of my GitHub repo if you would like to have a look: >>>>>>> >>>>>>> https://github.com/fschwach/bioperl-live/tree/sequtils_clone >>>>>>> >>>>>>> Unfortunately, I can't add this option to trunc_with_features because >>>>>>> the creation of the new object is delegated to 'trunc'. I guess I could >>>>>>> implement 'trunc' in Bio::SeqUtils itself(?) >>>>>>> >>>>>>> What do you think, could this be merged into bioperl-live? >>>>>>> >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach wrote: >>>>>>> >>>>>>>> Great, I'll work on a branch that gives the user the option to use clone >>>>>>>> instead of new and then we can see if we want to use that in the end. In >>>>>>>> the meantime, what do you think about pulling this into bioperl-live? >>>>>>>> When I have some time again I can work on the HOWTO for these new >>>>>>>> features for the BioPerl wiki >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> On 11/01/12 18:42, Fields, Christopher J wrote: >>>>>>>> >>>>>>>>> Note that Bio::Root::Root now has a clone() method that one can take advantage of for this purpose; if Storable or Clone is available, it will pick one of the two, preferably Clone over Storable. It's fairly untested, but we haven't run into problems with it yet (I think it was in the last CPAN release). >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hi Frank, >>>>>>>>>> >>>>>>>>>> Looks great, I like the use of between locations, didn't think of that. >>>>>>>>>> >>>>>>>>>> It was suggested that I avoid using Clone for cat, trunc_with_features etc. to avoid adding a dependency (which may no longer be an issue) and because it would cause problems for Bio::Seq implementations that use a database as the back-end. Maybe you could add it as an option, but keep the default as is? >>>>>>>>>> >>>>>>>>>> Cheers, >>>>>>>>>> Roy. >>>>>>>>>> >>>>>>>>>> On 11/01/2012 18:16, Frank Schwach wrote: >>>>>>>>>> >>>>>>>>>>> Hi Roy and Chris, >>>>>>>>>>> >>>>>>>>>>> I have made the changes to the code now. As you suggested, feature ends >>>>>>>>>>> no longer change type and I insert a note instead to inform about the >>>>>>>>>>> deletion (or insertion), showing the length and position. >>>>>>>>>>> I have also added a feature to annotate deletion sites themselves (with >>>>>>>>>>> IN-BETWEEN locations). >>>>>>>>>>> >>>>>>>>>>> Roy's test script now prints: >>>>>>>>>>> >>>>>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>>>>>> ACCESSION unknown >>>>>>>>>>> FEATURES Location/Qualifiers >>>>>>>>>>> CDS join(2..3,4..6) >>>>>>>>>>> /note="3bp internal deletion between pos 3 and 4" >>>>>>>>>>> CDS 2..3 >>>>>>>>>>> /note="2bp deleted from feature end" >>>>>>>>>>> misc_feature 3^4 >>>>>>>>>>> /note="deletion of 3bp" >>>>>>>>>>> ORIGIN >>>>>>>>>>> 1 aaaaaaa >>>>>>>>>>> // >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> or, if you add strand information (-1 in this case) to the second feature: >>>>>>>>>>> >>>>>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>>>>>> ACCESSION unknown >>>>>>>>>>> FEATURES Location/Qualifiers >>>>>>>>>>> CDS join(2..3,4..6) >>>>>>>>>>> /note="3bp internal deletion between pos 3 and 4" >>>>>>>>>>> CDS complement(2..3) >>>>>>>>>>> /note="2bp deleted from feature 5' end" >>>>>>>>>>> misc_feature 3^4 >>>>>>>>>>> /note="deletion of 3bp" >>>>>>>>>>> ORIGIN >>>>>>>>>>> 1 aaaaaaa >>>>>>>>>>> // >>>>>>>>>>> >>>>>>>>>>> I have comitted this along with some bugfixes to my master branch on GitHub >>>>>>>>>>> https://github.com/fschwach/bioperl-live >>>>>>>>>>> so it's now also in my existing pull request. >>>>>>>>>>> >>>>>>>>>>> I'm still wondering if cloning the sequence objects rather than calling >>>>>>>>>>> 'new' on their respective classes would be an option inside 'delete' and >>>>>>>>>>> 'insert'? >>>>>>>>>>> I'm experimenting with this for my own purposes because I have to work >>>>>>>>>>> with custom sub-classes of Bio::Seq which have additional attributes and >>>>>>>>>>> therefore set 'can_call_new' to false. >>>>>>>>>>> Without cloning the objects, I first have to convert the custom >>>>>>>>>>> Bio::Seq::Foo objects to standard Bio::Seq, which I would like to avoid. >>>>>>>>>>> Is there any reason why something like Clone::Fast should not be used in >>>>>>>>>>> this case? It seems to work for me but there may be situations where >>>>>>>>>>> this is going to blow up which I am not aware of. >>>>>>>>>>> Cloning rather than calling new could be made an option in >>>>>>>>>>> Bio::SeqUtils. I have most of the code for that already. >>>>>>>>>>> >>>>>>>>>>> Frank >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 10/01/12 17:31, Roy Chaudhuri wrote: >>>>>>>>>>> >>>>>>>>>>>> Or without the typo: >>>>>>>>>>>> >>>>>>>>>>>> CDS join(2..3,4..6) >>>>>>>>>>>> /note="3 bp internal deletion" >>>>>>>>>>>> CDS 2..3 >>>>>>>>>>>> /note="2 bp deleted from 3' end" >>>>>>>>>>>> >>>>>>>>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I think it's me that didn't explain very well - I was talking about >>>>>>>>>>>>> overlapping (rather than spanning) a deletion, although I think the same >>>>>>>>>>>>> principle applies to the spanning example you gave. Here's some test >>>>>>>>>>>>> code: >>>>>>>>>>>>> >>>>>>>>>>>>> #!/usr/bin/perl >>>>>>>>>>>>> use warnings FATAL=>qw(all); >>>>>>>>>>>>> use strict; >>>>>>>>>>>>> use Bio::Seq; >>>>>>>>>>>>> use Bio::SeqIO; >>>>>>>>>>>>> use Bio::SeqUtils; >>>>>>>>>>>>> use Bio::SeqFeature::Generic; >>>>>>>>>>>>> my $seq=Bio::Seq->new(-id=>'seq', -seq=>'AAAAAAAAAA'); >>>>>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>>>>>>>> -start=>2, >>>>>>>>>>>>> -end=>9)); >>>>>>>>>>>>> >>>>>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>>>>>>>> -start=>2, >>>>>>>>>>>>> -end=>5)); >>>>>>>>>>>>> my $out=Bio::SeqIO->newFh(-format=>'genbank'); >>>>>>>>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >>>>>>>>>>>>> print $out $trunc; >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> This currently outputs: >>>>>>>>>>>>> LOCUS seq-accession_number 7 bp dna linear UNK >>>>>>>>>>>>> ACCESSION unknown >>>>>>>>>>>>> FEATURES Location/Qualifiers >>>>>>>>>>>>> CDS join(2..>3,<4..6) >>>>>>>>>>>>> CDS 2..>3 >>>>>>>>>>>>> ORIGIN >>>>>>>>>>>>> 1 aaaaaaa >>>>>>>>>>>>> // >>>>>>>>>>>>> >>>>>>>>>>>>> However, I was suggesting that the feature table should be something >>>>>>>>>>>>> like: >>>>>>>>>>>>> CDS join(2..3,4..6) >>>>>>>>>>>>> /note="3 bp internal deletion" >>>>>>>>>>>>> CDS join(2..3) >>>>>>>>>>>>> /note="2 bp deleted from 3' end" >>>>>>>>>>>>> >>>>>>>>>>>>> Fuzzy locations are intended to represent features which have boundaries >>>>>>>>>>>>> spanning outside of the sequence. For a defined deletion that's not the >>>>>>>>>>>>> case, the boundaries of the feature aren't unknown, they have been >>>>>>>>>>>>> specifically altered. >>>>>>>>>>>>> >>>>>>>>>>>>> Hope this is clearer. >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Roy. >>>>>>>>>>>>> >>>>>>>>>>>>> On 10/01/2012 16:47, Frank Schwach wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Roy, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sorry, I hadn't explained that very well: it's not the outer boundaries >>>>>>>>>>>>>> of the feature that become fuzzy but the "inner" ones of the split >>>>>>>>>>>>>> locations: >>>>>>>>>>>>>> >>>>>>>>>>>>>> -------------------- a feature's location >>>>>>>>>>>>>> ==========xxxx================= sequence >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> --------- sublocation 1 >>>>>>>>>>>>>> -------- sublocation 2 >>>>>>>>>>>>>> =============================== >>>>>>>>>>>>>> >>>>>>>>>>>>>> x= sequence to delete >>>>>>>>>>>>>> The feature's location has changed from Simple to Split. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sublocation 1: >>>>>>>>>>>>>> start is still EXACT and has not changed >>>>>>>>>>>>>> end is now AFTER because this is not a true end of the feature >>>>>>>>>>>>>> >>>>>>>>>>>>>> Sublocation 2: >>>>>>>>>>>>>> start is BEFORE >>>>>>>>>>>>>> end is EXACT (but shifted) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I hope this makes more sense(?) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Frank >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy Chaudhuri wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Frank, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Looks good to me. One thing I'm not sure about - why do features >>>>>>>>>>>>>>> overlapping a deletion become fuzzy? That behaviour is in >>>>>>>>>>>>>>> trunc_with_features because it's intended to represent a taking a >>>>>>>>>>>>>>> subregion of a larger sequence, but if you're representing an internal >>>>>>>>>>>>>>> deletion then the boundaries of the overlapping feature aren't >>>>>>>>>>>>>>> unknown, >>>>>>>>>>>>>>> they have been specifically altered. Maybe you could give absolute >>>>>>>>>>>>>>> coordinates, but add a note indicating that the 5' or 3' end has been >>>>>>>>>>>>>>> truncated by however many bases. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Roy. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 10/01/2012 13:10, Frank Schwach wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have made the changes in a Git fork and made the pull request now. >>>>>>>>>>>>>>>> If this is accepted into BioPerl I can also write a little SeqUtils >>>>>>>>>>>>>>>> HOWTO for the BioPerl wiki. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Frank >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, Fields, Christopher J wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sounds very promising! The easiest way to contribute is via a >>>>>>>>>>>>>>>>> fork of the code on Github with a pull request (as you already >>>>>>>>>>>>>>>>> know, being a contributor to the Primer3 modules). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> chris >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank Schwach wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I needed to manipulate Bio::Seq objects with annotations and >>>>>>>>>>>>>>>>>> sequence >>>>>>>>>>>>>>>>>> features to simulate molecular cloning techniques, e.g. to cut a >>>>>>>>>>>>>>>>>> vector >>>>>>>>>>>>>>>>>> and insert a fragment into it while preserving all the >>>>>>>>>>>>>>>>>> annotations and >>>>>>>>>>>>>>>>>> moving the features accordingly. >>>>>>>>>>>>>>>>>> My main aim was to split features that span deletion/insertion >>>>>>>>>>>>>>>>>> sites in >>>>>>>>>>>>>>>>>> a meaningful way, which can not be done with the currently availble >>>>>>>>>>>>>>>>>> methods. >>>>>>>>>>>>>>>>>> I have modified Bio::SeqUtils so that I have the following new >>>>>>>>>>>>>>>>>> methods: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> delete >>>>>>>>>>>>>>>>>> ====== >>>>>>>>>>>>>>>>>> removes a segment from a sequence object and adjusts positions >>>>>>>>>>>>>>>>>> and types >>>>>>>>>>>>>>>>>> of locations of sequence features: >>>>>>>>>>>>>>>>>> - locations of features that span the deletion sites are turned >>>>>>>>>>>>>>>>>> into >>>>>>>>>>>>>>>>>> Splits. >>>>>>>>>>>>>>>>>> - locations that extend into the deleted region are turned to >>>>>>>>>>>>>>>>>> Fuzzy to >>>>>>>>>>>>>>>>>> indicate that their true start/end was lost. >>>>>>>>>>>>>>>>>> - locations contained inside the deleted regions are lost. >>>>>>>>>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>>>>>>>>> deletion. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> insert >>>>>>>>>>>>>>>>>> ====== >>>>>>>>>>>>>>>>>> adds a Bio::Seq object into another one between specified insertion >>>>>>>>>>>>>>>>>> sites. This also affects the features on the recipient sequence: >>>>>>>>>>>>>>>>>> - locations of features that span the insertion site are split but >>>>>>>>>>>>>>>>>> position types are not turned to Fuzzy because no part of the >>>>>>>>>>>>>>>>>> original >>>>>>>>>>>>>>>>>> feature is lost. >>>>>>>>>>>>>>>>>> - other features are shifted according to the length of the >>>>>>>>>>>>>>>>>> insertion. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> ligate >>>>>>>>>>>>>>>>>> ====== >>>>>>>>>>>>>>>>>> just for convenience. Supply a recipient, a fragment and one or two >>>>>>>>>>>>>>>>>> sites to cut the recipient. Can also flip the fragment if required. >>>>>>>>>>>>>>>>>> Simply calls delete [, reverse_complement_with_features] and >>>>>>>>>>>>>>>>>> insert in >>>>>>>>>>>>>>>>>> turn. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> One situation I haven't handled yet is a deletion that spans the >>>>>>>>>>>>>>>>>> origin >>>>>>>>>>>>>>>>>> of a circular molecule but that should be a rare thing to do >>>>>>>>>>>>>>>>>> anyway. The >>>>>>>>>>>>>>>>>> code currently throws an error if this is attempted. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I'm happy to contribute the code on Github if there is interest? >>>>>>>>>>>>>>>>>> Comments on the handling of feature locations highly welcome! >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Frank >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>>> -- >>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>> Limited, a charity registered in England with number 1021457 and a >>>> company registered in England with number 2742969, whose registered >>>> office is 215 Euston Road, London, NW1 2BE. >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From roy.chaudhuri at gmail.com Thu Jan 19 10:15:07 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Thu, 19 Jan 2012 15:15:07 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F17FB50.4000606@sanger.ac.uk> References: <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> <4F17FB50.4000606@sanger.ac.uk> Message-ID: <4F18337B.2010401@gmail.com> I'm not sure I understand the problem with "<1", < means "less than" not "less than or equal to", so it does not imply that the feature could start at position 1. I can see that there would be cases where negative coordinates might be useful, but I think it is opening a can of worms and could introduce many subtle bugs, so I'd vote for throwing an error. If you were to do it, it would be better to stick with the biological convention of -1 being the base before 1 (as used for -10 and -35 elements). Roy. On 19/01/2012 11:15, Frank Schwach wrote: > I'd prefer "<1" still but I don't insist on this feature if it is > controversial. Either way, I think we should either allow handling > negative positions or explicitly throw an error if a negative > position is passed in. At the moment, the method will take a negative > position but then do something unexpected with it as it treats it > like a positive value. The way I handled the negative position > assumes that "0" is to the left of "1", as you get when calculating > relative positions to a 1-based location. I can add something to the > POD to explain this. I was thinking of using this behaviour to add > another method to SeqUtils to add seqFeatures with relative > corodinates that may start/end outside of the Bio::Seq you want to > annotate. Roy, Chris: what do you prefer? > > Frank > > > > > Fields, Christopher J wrote: >> You could probably change "<1" to "<" if the start isn't meant to >> be defined; the former seems to imply the feature location is '1 or >> before the start', the latter is simply 'before the start'. Either >> version should work within bioperl AFAIK, though I'm not sure >> whether the feature table definition covers '> type. >> >> Just thinking aloud, but such behavior might be something that >> needs to be defined more specifically in the Bio::RangeI >> implementation, via trunc(). >> >> chris >> >> On Jan 18, 2012, at 11:46 AM, Frank Schwach wrote: >> >> >>> a yes, that's true. Still shoud be ok with _coord_adjust because >>> it trims everything<1 to a "<1" fuzzy location, so whether or not >>> 0 is a location doesn't matter in this case. But if you want me >>> to revert this change then I am happy to do that. >>> >>> Frank >>> >>> >>> On Wed, 2012-01-18 at 17:24 +0000, Fields, Christopher J wrote: >>> >>>>> From Bio::RangeI: "The behaviour of a range is undefined if >>>>> ranges with negative numbers or zero are used." This is left >>>>> ambiguous b/c the implementation may define this more >>>>> specifically, but SeqFeatures AFAIK do not clarify this any >>>>> more that Bio::RangeI does, so I don't think you can rely on >>>>> any particular consistent behavior. >>>>> >>>> Re: why this is so, the ambiguities pertain to how length, >>>> contains, overlaps, etc are calculated (these all assume >>>> positive 1-based coords). For example, since we use 1-based >>>> coords, do we include a 0 position with negative coordinates, >>>> or do negative coordinates start at -1? The current interface >>>> and implementations all adhere to how locations are defined in >>>> the DDBJ/EMBL/GenBank feature table definition, hence Roy's >>>> question. >>>> >>>> chris >>>> >>>> On Jan 18, 2012, at 11:08 AM, Frank Schwach wrote: >>>> >>>> >>>>> Thanks Roy! I have pushed that to master now and it's part of >>>>> my pending pull request. I just wanted to ask you because >>>>> this is code you had written and there might be a good reason >>>>> not to do this. Yes, you can create a SeqFeature object with >>>>> negative coordinates. >>>>> >>>>> Cheers, >>>>> >>>>> Frank >>>>> >>>>> >>>>> On Wed, 2012-01-18 at 16:49 +0000, Roy Chaudhuri wrote: >>>>> >>>>>> Ok, if the tests pass and it's useful then it's fine by me >>>>>> (not that you need my permission, of course). Just out of >>>>>> interest, do negative feature coordinates work with the >>>>>> rest of BioPerl? I don't think they are covered in the >>>>>> DDBJ/EMBL/GenBank feature table definition. >>>>>> >>>>>> Roy. >>>>>> >>>>>> On 18/01/2012 16:21, Frank Schwach wrote: >>>>>> >>>>>>> Hi Roy, >>>>>>> >>>>>>> I have a use-case for Bio::SeqUtils truncating a feature >>>>>>> with negative start location. In my case, this is >>>>>>> happening because I transform genomic coordinates of a >>>>>>> feature to the coordinate frame of another feature, so >>>>>>> feature A that starts 10nt before the reference feature >>>>>>> now has a start of -10. I want to trim that to a fuzzy >>>>>>> start = "<1" >>>>>>> >>>>>>> Bio::SeqUtils::_coord_adjust can be used to trim feature >>>>>>> A accordingly but we need to change the regex that >>>>>>> manipulates the coordinates slightly by adding an >>>>>>> optional "-": >>>>>>> >>>>>>> map s/(\d+)/if ($add+$1<1) {'<1'} elsif (defined $length >>>>>>> and $add +$1>$length) {">$length"} else {$add+$1}/ge, >>>>>>> @coords; >>>>>>> >>>>>>> becomes >>>>>>> >>>>>>> map s/(-?\d+)/if ($add+$1<1) {'<1'} elsif (defined >>>>>>> $length and $add +$1>$length) {">$length"} else >>>>>>> {$add+$1}/ge, @coords; >>>>>>> >>>>>>> It doesn't change anything else as far as I can tell and >>>>>>> adds some more flexibility to the method. If you are ok >>>>>>> with it I can push this to my queued pull request for >>>>>>> Bio::SeqUtils. >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, 2012-01-12 at 14:13 +0000, Frank Schwach wrote: >>>>>>> >>>>>>>> I have now created a version that gives the option to >>>>>>>> create the products of 'delete' and 'insert' via >>>>>>>> Bio::Root::Root:clone instead of calling 'new' on the >>>>>>>> input seq object class. Seems to be working fine for me >>>>>>>> so far. >>>>>>>> >>>>>>>> 'delete' and 'insert' can now take a hashref of >>>>>>>> options. The only option so far is to set 'clone_obj to >>>>>>>> true, to use cloning instead of creating objects via >>>>>>>> 'new'. Setting this parameter to false or not supplying >>>>>>>> the options hashref at all will give you the old >>>>>>>> behaviour (call 'new'). Example: >>>>>>>> >>>>>>>> my $product = Bio::SeqUtils->delete( $seq_obj, 11, 20, >>>>>>>> { clone_obj => 1} ); >>>>>>>> >>>>>>>> The ligate method takes clone_obj as a named >>>>>>>> parameter: >>>>>>>> >>>>>>>> my $new_molecule = Bio::Sequtils::Pbrtools->ligate( >>>>>>>> -recipient => $vector, -fragment => $fragment, >>>>>>>> -left => 1000, -right => 1100, -flip => 1, >>>>>>>> -clone_obj => 1 ); >>>>>>>> >>>>>>>> This is in a branch of my GitHub repo if you would like >>>>>>>> to have a look: >>>>>>>> >>>>>>>> https://github.com/fschwach/bioperl-live/tree/sequtils_clone >>>>>>>> >>>>>>>> >>>>>>>> Unfortunately, I can't add this option to trunc_with_features because >>>>>>>> the creation of the new object is delegated to 'trunc'. >>>>>>>> I guess I could implement 'trunc' in Bio::SeqUtils >>>>>>>> itself(?) >>>>>>>> >>>>>>>> What do you think, could this be merged into >>>>>>>> bioperl-live? >>>>>>>> >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, 2012-01-11 at 21:03 +0000, Frank Schwach >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Great, I'll work on a branch that gives the user the >>>>>>>>> option to use clone instead of new and then we can >>>>>>>>> see if we want to use that in the end. In the >>>>>>>>> meantime, what do you think about pulling this into >>>>>>>>> bioperl-live? When I have some time again I can work >>>>>>>>> on the HOWTO for these new features for the BioPerl >>>>>>>>> wiki >>>>>>>>> >>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> On 11/01/12 18:42, Fields, Christopher J wrote: >>>>>>>>> >>>>>>>>>> Note that Bio::Root::Root now has a clone() method >>>>>>>>>> that one can take advantage of for this purpose; if >>>>>>>>>> Storable or Clone is available, it will pick one of >>>>>>>>>> the two, preferably Clone over Storable. It's >>>>>>>>>> fairly untested, but we haven't run into problems >>>>>>>>>> with it yet (I think it was in the last CPAN >>>>>>>>>> release). >>>>>>>>>> >>>>>>>>>> chris >>>>>>>>>> >>>>>>>>>> On Jan 11, 2012, at 12:38 PM, Roy Chaudhuri wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Hi Frank, >>>>>>>>>>> >>>>>>>>>>> Looks great, I like the use of between locations, >>>>>>>>>>> didn't think of that. >>>>>>>>>>> >>>>>>>>>>> It was suggested that I avoid using Clone for >>>>>>>>>>> cat, trunc_with_features etc. to avoid adding a >>>>>>>>>>> dependency (which may no longer be an issue) and >>>>>>>>>>> because it would cause problems for Bio::Seq >>>>>>>>>>> implementations that use a database as the >>>>>>>>>>> back-end. Maybe you could add it as an option, >>>>>>>>>>> but keep the default as is? >>>>>>>>>>> >>>>>>>>>>> Cheers, Roy. >>>>>>>>>>> >>>>>>>>>>> On 11/01/2012 18:16, Frank Schwach wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi Roy and Chris, >>>>>>>>>>>> >>>>>>>>>>>> I have made the changes to the code now. As you >>>>>>>>>>>> suggested, feature ends no longer change type >>>>>>>>>>>> and I insert a note instead to inform about >>>>>>>>>>>> the deletion (or insertion), showing the length >>>>>>>>>>>> and position. I have also added a feature to >>>>>>>>>>>> annotate deletion sites themselves (with >>>>>>>>>>>> IN-BETWEEN locations). >>>>>>>>>>>> >>>>>>>>>>>> Roy's test script now prints: >>>>>>>>>>>> >>>>>>>>>>>> LOCUS seq-accession_number 7 >>>>>>>>>>>> bp dna linear UNK ACCESSION unknown >>>>>>>>>>>> FEATURES Location/Qualifiers CDS >>>>>>>>>>>> join(2..3,4..6) /note="3bp internal deletion >>>>>>>>>>>> between pos 3 and 4" CDS 2..3 >>>>>>>>>>>> /note="2bp deleted from feature end" >>>>>>>>>>>> misc_feature 3^4 /note="deletion of 3bp" >>>>>>>>>>>> ORIGIN 1 aaaaaaa // >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> or, if you add strand information (-1 in this >>>>>>>>>>>> case) to the second feature: >>>>>>>>>>>> >>>>>>>>>>>> LOCUS seq-accession_number 7 >>>>>>>>>>>> bp dna linear UNK ACCESSION unknown >>>>>>>>>>>> FEATURES Location/Qualifiers CDS >>>>>>>>>>>> join(2..3,4..6) /note="3bp internal deletion >>>>>>>>>>>> between pos 3 and 4" CDS >>>>>>>>>>>> complement(2..3) /note="2bp deleted from >>>>>>>>>>>> feature 5' end" misc_feature 3^4 >>>>>>>>>>>> /note="deletion of 3bp" ORIGIN 1 aaaaaaa // >>>>>>>>>>>> >>>>>>>>>>>> I have comitted this along with some bugfixes >>>>>>>>>>>> to my master branch on GitHub >>>>>>>>>>>> https://github.com/fschwach/bioperl-live so >>>>>>>>>>>> it's now also in my existing pull request. >>>>>>>>>>>> >>>>>>>>>>>> I'm still wondering if cloning the sequence >>>>>>>>>>>> objects rather than calling 'new' on their >>>>>>>>>>>> respective classes would be an option inside >>>>>>>>>>>> 'delete' and 'insert'? I'm experimenting with >>>>>>>>>>>> this for my own purposes because I have to >>>>>>>>>>>> work with custom sub-classes of Bio::Seq which >>>>>>>>>>>> have additional attributes and therefore set >>>>>>>>>>>> 'can_call_new' to false. Without cloning the >>>>>>>>>>>> objects, I first have to convert the custom >>>>>>>>>>>> Bio::Seq::Foo objects to standard Bio::Seq, >>>>>>>>>>>> which I would like to avoid. Is there any >>>>>>>>>>>> reason why something like Clone::Fast should >>>>>>>>>>>> not be used in this case? It seems to work for >>>>>>>>>>>> me but there may be situations where this is >>>>>>>>>>>> going to blow up which I am not aware of. >>>>>>>>>>>> Cloning rather than calling new could be made >>>>>>>>>>>> an option in Bio::SeqUtils. I have most of the >>>>>>>>>>>> code for that already. >>>>>>>>>>>> >>>>>>>>>>>> Frank >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 10/01/12 17:31, Roy Chaudhuri wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Or without the typo: >>>>>>>>>>>>> >>>>>>>>>>>>> CDS join(2..3,4..6) /note="3 bp >>>>>>>>>>>>> internal deletion" CDS 2..3 >>>>>>>>>>>>> /note="2 bp deleted from 3' end" >>>>>>>>>>>>> >>>>>>>>>>>>> On 10/01/2012 17:27, Roy Chaudhuri wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I think it's me that didn't explain very >>>>>>>>>>>>>> well - I was talking about overlapping >>>>>>>>>>>>>> (rather than spanning) a deletion, although >>>>>>>>>>>>>> I think the same principle applies to the >>>>>>>>>>>>>> spanning example you gave. Here's some >>>>>>>>>>>>>> test code: >>>>>>>>>>>>>> >>>>>>>>>>>>>> #!/usr/bin/perl use warnings >>>>>>>>>>>>>> FATAL=>qw(all); use strict; use Bio::Seq; >>>>>>>>>>>>>> use Bio::SeqIO; use Bio::SeqUtils; use >>>>>>>>>>>>>> Bio::SeqFeature::Generic; my >>>>>>>>>>>>>> $seq=Bio::Seq->new(-id=>'seq', >>>>>>>>>>>>>> -seq=>'AAAAAAAAAA'); >>>>>>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>>>>>>>>> >>>>>>>>>>>>>> -start=>2, >>>>>>>>>>>>>> -end=>9)); >>>>>>>>>>>>>> >>>>>>>>>>>>>> $seq->add_SeqFeature(Bio::SeqFeature::Generic->new(-primary_tag=>'CDS', >>>>>>>>>>>>>> >>>>>>>>>>>>>> -start=>2, >>>>>>>>>>>>>> -end=>5)); my >>>>>>>>>>>>>> $out=Bio::SeqIO->newFh(-format=>'genbank'); >>>>>>>>>>>>>> >>>>>>>>>>>>>> my $trunc=Bio::SeqUtils->delete($seq, 4, 6); >>>>>>>>>>>>>> print $out $trunc; >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> This currently outputs: LOCUS >>>>>>>>>>>>>> seq-accession_number 7 bp dna >>>>>>>>>>>>>> linear UNK ACCESSION unknown FEATURES >>>>>>>>>>>>>> Location/Qualifiers CDS >>>>>>>>>>>>>> join(2..>3,<4..6) CDS 2..>3 >>>>>>>>>>>>>> ORIGIN 1 aaaaaaa // >>>>>>>>>>>>>> >>>>>>>>>>>>>> However, I was suggesting that the feature >>>>>>>>>>>>>> table should be something like: CDS >>>>>>>>>>>>>> join(2..3,4..6) /note="3 bp internal >>>>>>>>>>>>>> deletion" CDS join(2..3) >>>>>>>>>>>>>> /note="2 bp deleted from 3' end" >>>>>>>>>>>>>> >>>>>>>>>>>>>> Fuzzy locations are intended to represent >>>>>>>>>>>>>> features which have boundaries spanning >>>>>>>>>>>>>> outside of the sequence. For a defined >>>>>>>>>>>>>> deletion that's not the case, the >>>>>>>>>>>>>> boundaries of the feature aren't unknown, >>>>>>>>>>>>>> they have been specifically altered. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hope this is clearer. Cheers, Roy. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 10/01/2012 16:47, Frank Schwach wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Roy, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sorry, I hadn't explained that very well: >>>>>>>>>>>>>>> it's not the outer boundaries of the >>>>>>>>>>>>>>> feature that become fuzzy but the "inner" >>>>>>>>>>>>>>> ones of the split locations: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -------------------- a >>>>>>>>>>>>>>> feature's location >>>>>>>>>>>>>>> ==========xxxx================= sequence >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> --------- sublocation >>>>>>>>>>>>>>> 1 -------- sublocation 2 >>>>>>>>>>>>>>> =============================== >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> x= sequence to delete The feature's >>>>>>>>>>>>>>> location has changed from Simple to >>>>>>>>>>>>>>> Split. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sublocation 1: start is still EXACT and >>>>>>>>>>>>>>> has not changed end is now AFTER because >>>>>>>>>>>>>>> this is not a true end of the feature >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sublocation 2: start is BEFORE end is >>>>>>>>>>>>>>> EXACT (but shifted) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I hope this makes more sense(?) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Frank >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, 2012-01-10 at 15:25 +0000, Roy >>>>>>>>>>>>>>> Chaudhuri wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Frank, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Looks good to me. One thing I'm not >>>>>>>>>>>>>>>> sure about - why do features >>>>>>>>>>>>>>>> overlapping a deletion become fuzzy? >>>>>>>>>>>>>>>> That behaviour is in >>>>>>>>>>>>>>>> trunc_with_features because it's >>>>>>>>>>>>>>>> intended to represent a taking a >>>>>>>>>>>>>>>> subregion of a larger sequence, but if >>>>>>>>>>>>>>>> you're representing an internal >>>>>>>>>>>>>>>> deletion then the boundaries of the >>>>>>>>>>>>>>>> overlapping feature aren't unknown, >>>>>>>>>>>>>>>> they have been specifically altered. >>>>>>>>>>>>>>>> Maybe you could give absolute >>>>>>>>>>>>>>>> coordinates, but add a note indicating >>>>>>>>>>>>>>>> that the 5' or 3' end has been >>>>>>>>>>>>>>>> truncated by however many bases. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, Roy. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 10/01/2012 13:10, Frank Schwach >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Chris, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have made the changes in a Git fork >>>>>>>>>>>>>>>>> and made the pull request now. If >>>>>>>>>>>>>>>>> this is accepted into BioPerl I can >>>>>>>>>>>>>>>>> also write a little SeqUtils HOWTO >>>>>>>>>>>>>>>>> for the BioPerl wiki. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Frank >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, 2012-01-09 at 18:29 +0000, >>>>>>>>>>>>>>>>> Fields, Christopher J wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sounds very promising! The easiest >>>>>>>>>>>>>>>>>> way to contribute is via a fork of >>>>>>>>>>>>>>>>>> the code on Github with a pull >>>>>>>>>>>>>>>>>> request (as you already know, being >>>>>>>>>>>>>>>>>> a contributor to the Primer3 >>>>>>>>>>>>>>>>>> modules). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> chris >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Jan 9, 2012, at 11:10 AM, Frank >>>>>>>>>>>>>>>>>> Schwach wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I needed to manipulate Bio::Seq >>>>>>>>>>>>>>>>>>> objects with annotations and >>>>>>>>>>>>>>>>>>> sequence features to simulate >>>>>>>>>>>>>>>>>>> molecular cloning techniques, >>>>>>>>>>>>>>>>>>> e.g. to cut a vector and insert a >>>>>>>>>>>>>>>>>>> fragment into it while preserving >>>>>>>>>>>>>>>>>>> all the annotations and moving >>>>>>>>>>>>>>>>>>> the features accordingly. My main >>>>>>>>>>>>>>>>>>> aim was to split features that >>>>>>>>>>>>>>>>>>> span deletion/insertion sites in >>>>>>>>>>>>>>>>>>> a meaningful way, which can not >>>>>>>>>>>>>>>>>>> be done with the currently >>>>>>>>>>>>>>>>>>> availble methods. I have modified >>>>>>>>>>>>>>>>>>> Bio::SeqUtils so that I have the >>>>>>>>>>>>>>>>>>> following new methods: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> delete ====== removes a segment >>>>>>>>>>>>>>>>>>> from a sequence object and >>>>>>>>>>>>>>>>>>> adjusts positions and types of >>>>>>>>>>>>>>>>>>> locations of sequence features: - >>>>>>>>>>>>>>>>>>> locations of features that span >>>>>>>>>>>>>>>>>>> the deletion sites are turned >>>>>>>>>>>>>>>>>>> into Splits. - locations that >>>>>>>>>>>>>>>>>>> extend into the deleted region >>>>>>>>>>>>>>>>>>> are turned to Fuzzy to indicate >>>>>>>>>>>>>>>>>>> that their true start/end was >>>>>>>>>>>>>>>>>>> lost. - locations contained >>>>>>>>>>>>>>>>>>> inside the deleted regions are >>>>>>>>>>>>>>>>>>> lost. - other features are >>>>>>>>>>>>>>>>>>> shifted according to the length >>>>>>>>>>>>>>>>>>> of the deletion. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> insert ====== adds a Bio::Seq >>>>>>>>>>>>>>>>>>> object into another one between >>>>>>>>>>>>>>>>>>> specified insertion sites. This >>>>>>>>>>>>>>>>>>> also affects the features on the >>>>>>>>>>>>>>>>>>> recipient sequence: - locations >>>>>>>>>>>>>>>>>>> of features that span the >>>>>>>>>>>>>>>>>>> insertion site are split but >>>>>>>>>>>>>>>>>>> position types are not turned to >>>>>>>>>>>>>>>>>>> Fuzzy because no part of the >>>>>>>>>>>>>>>>>>> original feature is lost. - other >>>>>>>>>>>>>>>>>>> features are shifted according to >>>>>>>>>>>>>>>>>>> the length of the insertion. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> ligate ====== just for >>>>>>>>>>>>>>>>>>> convenience. Supply a recipient, >>>>>>>>>>>>>>>>>>> a fragment and one or two sites >>>>>>>>>>>>>>>>>>> to cut the recipient. Can also >>>>>>>>>>>>>>>>>>> flip the fragment if required. >>>>>>>>>>>>>>>>>>> Simply calls delete [, >>>>>>>>>>>>>>>>>>> reverse_complement_with_features] >>>>>>>>>>>>>>>>>>> and insert in turn. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> One situation I haven't handled >>>>>>>>>>>>>>>>>>> yet is a deletion that spans the >>>>>>>>>>>>>>>>>>> origin of a circular molecule but >>>>>>>>>>>>>>>>>>> that should be a rare thing to >>>>>>>>>>>>>>>>>>> do anyway. The code currently >>>>>>>>>>>>>>>>>>> throws an error if this is >>>>>>>>>>>>>>>>>>> attempted. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm happy to contribute the code >>>>>>>>>>>>>>>>>>> on Github if there is interest? >>>>>>>>>>>>>>>>>>> Comments on the handling of >>>>>>>>>>>>>>>>>>> feature locations highly >>>>>>>>>>>>>>>>>>> welcome! >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Frank >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> -- The Wellcome Trust Sanger Institute is operated by Genome >>>>> Research Limited, a charity registered in England with number >>>>> 1021457 and a company registered in England with number >>>>> 2742969, whose registered office is 215 Euston Road, London, >>>>> NW1 2BE. _______________________________________________ >>>>> Bioperl-l mailing list Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>> >>> -- The Wellcome Trust Sanger Institute is operated by Genome >>> Research Limited, a charity registered in England with number >>> 1021457 and a company registered in England with number 2742969, >>> whose registered office is 215 Euston Road, London, NW1 2BE. >>> >> >> > > > From cjfields at illinois.edu Thu Jan 19 10:29:32 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 19 Jan 2012 15:29:32 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: References: <1316094243.3797.669.camel@deskpro15336.internal.sanger.ac.uk> <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> Message-ID: <23542893-C264-4521-BD53-8869EFDD75DA@illinois.edu> On Jan 19, 2012, at 5:04 AM, Peter Cock wrote: > On Wed, Jan 18, 2012 at 6:11 PM, Fields, Christopher J > wrote: >> You could probably change "<1" to "<" if the start isn't meant to be defined; >> the former seems to imply the feature location is '1 or before the start', the >> latter is simply 'before the start'. Either version should work within bioperl >> AFAIK, though I'm not sure whether the feature table definition covers '> as a location type. >> >> Just thinking aloud, but such behavior might be something that needs to >> be defined more specifically in the Bio::RangeI implementation, via trunc(). >> >> chris > > Using '<1' or in general ' feature tables - not sure about '<' though, can say I recall ever seeing that. > > There are some similar issues with UniProt/SwissProt files and features. > There I've seen '<1' used for before start (a partial peptide), but they also > have '?' for unknown which has no analogue in the GenBank/EMBL feature > table. > > Peter Yes, after you posted this and re-reviewing the FT again I realized my mistake. Thanks for correcting. Helps to have a proper coffee before responding :P chris From cjfields at illinois.edu Thu Jan 19 10:39:42 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Thu, 19 Jan 2012 15:39:42 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F18337B.2010401@gmail.com> References: <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> <4F17FB50.4000606@sanger.ac.uk> <4F18337B.2010401@gmail.com> Message-ID: <9596B4F5-6FD0-48B6-9E46-70C0EE3E5738@illinois.edu> On Jan 19, 2012, at 9:15 AM, Roy Chaudhuri wrote: > I'm not sure I understand the problem with "<1", < means "less than" not "less than or equal to", so it does not imply that the feature could start at position 1. Yup, that's correct. My bad. > I can see that there would be cases where negative coordinates might be useful, but I think it is opening a can of worms and could introduce many subtle bugs, so I'd vote for throwing an error. If you were to do it, it would be better to stick with the biological convention of -1 being the base before 1 (as used for -10 and -35 elements). > > Roy. Agree with Roy, I think it's best to avoid using negative cords when at all possible, mainly b/c it introduces possibly inconsistent behavior. Such behavior should be defined in the feature class or its parent class(es), wherever appropriate. At the moment that doesn't hold true. As a side note, I recall Lincoln allowed negative coords with GBrowse features but I don't recall whether it's officially supported. chris From fs5 at sanger.ac.uk Thu Jan 19 10:44:42 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 19 Jan 2012 15:44:42 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <9596B4F5-6FD0-48B6-9E46-70C0EE3E5738@illinois.edu> References: <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> <4F17FB50.4000606@sanger.ac.uk> <4F18337B.2010401@gmail.com> <9596B4F5-6FD0-48B6-9E46-70C0EE3E5738@illinois.edu> Message-ID: <4F183A6A.4020409@sanger.ac.uk> ok, then let's throw an error when negative postions are supplied. I'll make the changes to the queued pull request. Cheers, Frank On 19/01/12 15:39, Fields, Christopher J wrote: > On Jan 19, 2012, at 9:15 AM, Roy Chaudhuri wrote: > >> I'm not sure I understand the problem with "<1",< means "less than" not "less than or equal to", so it does not imply that the feature could start at position 1. > > Yup, that's correct. My bad. > >> I can see that there would be cases where negative coordinates might be useful, but I think it is opening a can of worms and could introduce many subtle bugs, so I'd vote for throwing an error. If you were to do it, it would be better to stick with the biological convention of -1 being the base before 1 (as used for -10 and -35 elements). >> >> Roy. > > Agree with Roy, I think it's best to avoid using negative cords when at all possible, mainly b/c it introduces possibly inconsistent behavior. Such behavior should be defined in the feature class or its parent class(es), wherever appropriate. At the moment that doesn't hold true. > > As a side note, I recall Lincoln allowed negative coords with GBrowse features but I don't recall whether it's officially supported. > > chris -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From Russell.Smithies at agresearch.co.nz Thu Jan 19 15:45:25 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Fri, 20 Jan 2012 09:45:25 +1300 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <9596B4F5-6FD0-48B6-9E46-70C0EE3E5738@illinois.edu> References: <1326129037.4396.64.camel@deskpro15336.internal.sanger.ac.uk> <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> <4F17FB50.4000606@sanger.ac.uk> <4F18337B.2010401@gmail.com> <9596B4F5-6FD0-48B6-9E46-70C0EE3E5738@illinois.edu> Message-ID: <18DF7D20DFEC044098A1062202F5FFF343C8B52E2F@exchsth.agresearch.co.nz> Just to throw a spanner in the works, how do we deal with circular features? I'm guessing an "is_circular" flag somewhere then negative cords might be useful? Is that what Lincoln did? I think there was a move to allow Gbrowse to handle circular features but don't recall the details. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J > Sent: Friday, 20 January 2012 4:40 a.m. > To: Roy Chaudhuri > Cc: Frank Schwach; > Subject: Re: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico > cloning > > On Jan 19, 2012, at 9:15 AM, Roy Chaudhuri wrote: > > > I'm not sure I understand the problem with "<1", < means "less than" not > "less than or equal to", so it does not imply that the feature could start at > position 1. > > Yup, that's correct. My bad. > > > I can see that there would be cases where negative coordinates might be > useful, but I think it is opening a can of worms and could introduce many > subtle bugs, so I'd vote for throwing an error. If you were to do it, it would be > better to stick with the biological convention of -1 being the base before 1 (as > used for -10 and -35 elements). > > > > Roy. > > Agree with Roy, I think it's best to avoid using negative cords when at all > possible, mainly b/c it introduces possibly inconsistent behavior. Such > behavior should be defined in the feature class or its parent class(es), > wherever appropriate. At the moment that doesn't hold true. > > As a side note, I recall Lincoln allowed negative coords with GBrowse > features but I don't recall whether it's officially supported. > > chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From fs5 at sanger.ac.uk Fri Jan 20 04:44:33 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Fri, 20 Jan 2012 09:44:33 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF343C8B52E2F@exchsth.agresearch.co.nz> References: <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> <4F17FB50.4000606@sanger.ac.uk> <4F18337B.2010401@gmail.com> <9596B4F5-6FD0-48B6-9E46-70C0EE3E5738@illinois.edu> <18DF7D20DFEC044098A1062202F5FFF343C8B52E2F@exchsth.agresearch.co.nz> Message-ID: <4F193781.6050605@sanger.ac.uk> Hi Russell, As far as I can see, the previously existing methods 'cat' and 'trunc_with_features' don't need to deal with circular sequences because they wouldn't make biological sense for circular molecules - although I'm not sure if it's currently checked whether is_circular is true, which we probably should. For the new methods 'delete', 'insert' and 'ligate' it would make sense to use circular molecules as input and it should work except when a deletion spans the origin. I currently throw an error when the origin would be affected in a circular molecule. Frank On 19/01/12 20:45, Smithies, Russell wrote: > Just to throw a spanner in the works, how do we deal with circular features? > I'm guessing an "is_circular" flag somewhere then negative cords might be useful? > Is that what Lincoln did? I think there was a move to allow Gbrowse to handle circular features but don't recall the details. > > --Russell > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Fields, Christopher J >> Sent: Friday, 20 January 2012 4:40 a.m. >> To: Roy Chaudhuri >> Cc: Frank Schwach; >> Subject: Re: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico >> cloning >> >> On Jan 19, 2012, at 9:15 AM, Roy Chaudhuri wrote: >> >>> I'm not sure I understand the problem with "<1",< means "less than" not >> "less than or equal to", so it does not imply that the feature could start at >> position 1. >> >> Yup, that's correct. My bad. >> >>> I can see that there would be cases where negative coordinates might be >> useful, but I think it is opening a can of worms and could introduce many >> subtle bugs, so I'd vote for throwing an error. If you were to do it, it would be >> better to stick with the biological convention of -1 being the base before 1 (as >> used for -10 and -35 elements). >>> Roy. >> Agree with Roy, I think it's best to avoid using negative cords when at all >> possible, mainly b/c it introduces possibly inconsistent behavior. Such >> behavior should be defined in the feature class or its parent class(es), >> wherever appropriate. At the moment that doesn't hold true. >> >> As a side note, I recall Lincoln allowed negative coords with GBrowse >> features but I don't recall whether it's officially supported. >> >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From p.j.a.cock at googlemail.com Fri Jan 20 05:46:18 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 20 Jan 2012 10:46:18 +0000 Subject: [Bioperl-l] NCBI adoption of AGP v2.0 and new qualifiers in GenBank/EMBL Message-ID: Dear all, I just spotted this via the @NCBI twitter feed, http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/agp_spec_change.shtml In addition to the NCBI switch from AGP v1.1 to v2.0, the INSDC have recently added a new feature type called "assembly_gap", and the associated qualifiers "gap_type" and "linkage_evidence" to the INSDC Feature Table Definitons. Quoting from version 10.0, dated Dec 2011 http://www.insdc.org/documents/feature_table.html#7.2 > Feature Key assembly_gap > > > Definition gap between two components of a CON record that is > part of a genome assembly; > > Mandatory qualifiers /estimated_length=unknown or > /gap_type="TYPE" > /linkage_evidence="TYPE" (Note: Mandatory only if the > /gap_type is "within scaffold" or "repeat within > scaffold".If there are multiple types of linkage_evidence > they will appear as multiple /linkage_evidence="TYPE" > qualifiers. For all other types of assembly_gap > features, use of the /linkage_evidence qualifier is > invalid.) > > Comment the location span of the assembly_gap feature for an > unknown gap is 100 bp, with the 100 bp indicated as > 100 "n"'s in sequence. > i.e. DDBJ, ENA & GenBank flat-files will start to use the "assembly_gap" features to display information derived from version 2.0 AGP files from 10th Feb 2012. Probably this will affect the XML variants as well. Unless any of the parsers/writers for GenBank or EMBL flat files use a white list approach, the new feature key and qualifiers shouldn't cause a problem. Peter From cjfields at illinois.edu Fri Jan 20 11:50:56 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 20 Jan 2012 16:50:56 +0000 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <4F199A27.8030408@inria.fr> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> <4F199A27.8030408@inria.fr> Message-ID: <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> Florian, Re: patches, we can accept these, but I would like to point out the code is publicly available on github: https://github.com/bioperl/bioperl-db The fastest way to contribute is to create a github account, fork the code from that repository, checkout a local copy using git, then push the changes back to your fork so they are not lost. You can then submit a pull request that should appear on the bioperl developers mailing list, where one of us can simply (via the github interface) merge your changes in. chris On Jan 20, 2012, at 10:45 AM, lajus wrote: > Hi all, > > As I have said, I've worked on SeqFeatureAdaptor to also persist and retrieve sub-features. > My code is in attachment: > > I have modified the last stable version of : > - Bio/DB/BioSQL/SeqFeatureAdaptor.pm > - Bio/DB/BIOSQL/SeqAdaptor.pm > I have created: > - Bio/SeqFeature/SeqFeatureRealtionship.pm > - Bio/DB/BioSQL/SeqFeatureRealtionshipAdaptor.pm > some tests: > - SeqFeatureRealtionship.t (very simple test for the also rally simple SeqFeatureRealtionship class) > - SeqFeatureRealtionshipAdaptor.t (test persistent of subfeatures object in database. Need a database with BioSQL to connect with (even if no commit is done)) => I have only tested with a PostGres database ... > > If you have advices, questions about my implementation or about my tests, don't hesitate to tell me. > > Is there a way to include my modifications to a future release of BioPerl ? > > Florian > > Le 14/01/2012 17:12, Hilmar Lapp a ?crit : >> Hi Florian, >> >> You could do that (and it might have advantages in terms of code separation), but you don't have to. In general, adaptor classes get instantiated by the Bioperl-DB framework when a Bioperl class that is mapped to it needs to get serialized or populated. Since there is no class in Bioperl that would correspond to a seqfeature relationship, those situations won't occur. >> >> So you could just keep it simple and expand store_children() and correspondingly their retrieval in the adaptor class for seqfeatures. But as hinted above, you may still prefer a separate adaptor class just to keep the nitty-gritty of storing/loading the relationships out of the main adaptor class. Really up to you how you feel more comfortable. >> >> -hilmar >> >> Sent with a tap. >> >> On Jan 13, 2012, at 4:27 AM, lajus wrote: >> >>> I should write >>> - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB/BioSQL >>> of course >>> >>> Le 13/01/2012 10:25, lajus a ?crit : >>>> Hi hilmar, >>>> >>>> Thanks for your hint, but I'm quite lost in the BioPerl architecture (and quite new in perl programming). I'd like to use the handling of term-to-term relationships as a template but I don't find what files are related to this. >>>> >>>> As far as I understand, I should create: >>>> - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB >>>> - a new object SeqFeatureRelationship (and its interface) in Bio/Seqfeature >>>> - modify SeqFeatureAdaptor to store children (just with a call to subSeqFeature in store_children sub and thanks to my SeqFeatureRelationshipAdaptor create new relationships) >>>> - modify SeqFeatureAdaptor to retrieve children ( thanks to my SeqFeatureRelationshipAdaptor create new relationships ) >>>> >>>> Is it the right way? >>>> >>>> Florian >>>> >>>> Le 12/01/2012 18:49, Hilmar Lapp a ?crit : >>>>> Hi Florian, >>>>> >>>>> Thanks for digging this up - this is what I had in memory, but I ran out of time last night in ascertaining that it is indeed still true. >>>>> >>>>> It'd be awesome if you can add the code to SeqFeatureAdaptor to also persist and retrieve sub-features. I think the object-relational mappings are all there already (in BaseDriver.pm). You could use the handling of bioentry-to-bioentry relationships (or term-to-term relationships) as a template for how to implement this. >>>>> >>>>> -hilmar >>>>> >>>>> On Jan 12, 2012, at 4:23 AM, lajus wrote: >>>>> >>>>>> Ok, I have looked in BioPerl code and it appears that subSeqFeature are not handled yet: >>>>>> comment in SeqFeatureAdaptor.pm for store children function (and attach childrenn too): >>>>>> "Bio::SeqFeatureI has a location, annotation, and possibly sub-seqfeatures as children. The latter is not implemented yet." >>>>>> >>>>>> So it's totally normal, if it doesn't work. >>>>>> Have you started to implement this stuff, or should I rewrite another SeqFeatureAdaptor which handle this ? >>>>>> >>>>>> Florian >>>>>> >>>>>> Le 11/01/2012 16:44, Fields, Christopher J a ?crit : >>>>>>> Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Jan 11, 2012, at 7:09 AM, lajus wrote: >>>>>>> >>>>>>>> Therefore, if I look in verbose mode, I can see that in the stack I have many : >>>>>>>> >>>>>>>> no adaptor found for class Bio::Annotation::TypeManager >>>>>>>> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory >>>>>>>> >>>>>>>> Just warning, no errors but... >>>>>>>> Any clues? >>>>>>>> >>>>>>>> Thanks by advance, >>>>>>>> >>>>>>>> Florian >>>>>>>> >>>>>>>> Le 11/01/2012 13:43, lajus a ?crit : >>>>>>>>> I have looked to the Unflattener and the magic works quite fine. >>>>>>>>> Then, the $seq which is given (by side-effect) by >>>>>>>>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>>>>>>>> has a good hierarchy for us. >>>>>>>>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >>>>>>>>> But when I create a persitent object for $seq and if I create it: >>>>>>>>> $adaptor->create_persistent($seq); >>>>>>>>> $pseq->create(); >>>>>>>>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >>>>>>>>> >>>>>>>>> Do you have an explanation? >>>>>>>>> >>>>>>>>> Florian >>>>>>>>> >>>>>>>>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>>>>>>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>>>>>>>> >>>>>>>>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>>> I am currently working on a refactoring of the Genolevures project >>>>>>>>>>>> (http://www.genolevures.org/) >>>>>>>>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>>>>>>>>> database. >>>>>>>>>>>> >>>>>>>>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>>>>>>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>>>>>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>>>>>>>> I find it strange in so far as there is a relationship between gene and its >>>>>>>>>>>> CDS. right? >>>>>>>>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>>>>>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>>>>>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>>>>>>>>> attempts to infer this kind of relationship, and if it did, if that would >>>>>>>>>>> get record in the BioSQL tables. >>>>>>>>>>> >>>>>>>>>>> Peter >>>>>>>>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>>>>>>>>> >>>>>>>>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>>>>>>>> >>>>>>>>>> chris >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From guhli007 at umn.edu Fri Jan 20 11:48:19 2012 From: guhli007 at umn.edu (Joseph Guhlin) Date: Fri, 20 Jan 2012 10:48:19 -0600 Subject: [Bioperl-l] Strange Error, Bio::Graphics::Feature attach_seq is not implemented Message-ID: Is this the right place to send it? I couldn't find anything in the archives. I've got a simple plugin I'm working on for gbrowse, and have been having trouble. I removed it from gbrowse completely, and am running the script. I get this error: ------------- EXCEPTION: Bio::Root::NotImplemented ------------- MSG: Abstract method "Bio::SeqFeatureI::attach_seq" is not implemented by package Bio::Graphics::Feature. This is not your fault - author of Bio::Graphics::Feature should be blamed! STACK Bio::Root::RootI::throw_not_implemented /opt/local/lib/perl5/site_perl/5.12.3/Bio/Root/RootI.pm:748 STACK Bio::SeqFeatureI::attach_seq /opt/local/lib/perl5/site_perl/5.12.3/Bio/SeqFeatureI.pm:289 STACK toplevel dnaglyphtest.pl:45 ---------------------------------------------------------------- I made use Bio::Graphics and BioPerl were up to date with CPAN. It works if I use it as a Bio::SeqFeature::Generic but all the references I've found says you can use it as a feature. Putting in a workaround here but not familiar enough with BioPerl to make a patch. Could it be something I'm missing as well? This is testing code to find where the problem is, ignore how terrible it looks and how redundant most of it is. It's just me ripping it out to get to the error(GBrowse doesn't seem to like to give out error messages, but that's unrelated...) http://pastebin.ca/2104560 Thanks, --Joseph From scott at scottcain.net Fri Jan 20 12:29:07 2012 From: scott at scottcain.net (Scott Cain) Date: Fri, 20 Jan 2012 12:29:07 -0500 Subject: [Bioperl-l] Strange Error, Bio::Graphics::Feature attach_seq is not implemented In-Reply-To: References: Message-ID: Hi Joseph, I agree that Bio::Graphics::Feature doesn't have an attach_seq method, nor does Bio::SeqFeature::Lite that it inherits from. I don't know why they don't--it might be a design decision, or it could just be an oversight. In the mean time, is there a reason you couldn't use Bio::SeqFeature::Generic? The only place in the Bio::Graphics or GBrowse code base that the attach_seq method is used is in Bio::Graphics::Glyph::dna, and it uses a Bio::SeqFeature::Generic. Scott On Fri, Jan 20, 2012 at 11:48 AM, Joseph Guhlin wrote: > Is this the right place to send it? I couldn't find anything in the > archives. I've got a simple plugin I'm working on for gbrowse, and have > been having trouble. I removed it from gbrowse completely, and am running > the script. I get this error: > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > MSG: Abstract method "Bio::SeqFeatureI::attach_seq" is not implemented by > package Bio::Graphics::Feature. > This is not your fault - author of Bio::Graphics::Feature should be blamed! > > STACK Bio::Root::RootI::throw_not_implemented > /opt/local/lib/perl5/site_perl/5.12.3/Bio/Root/RootI.pm:748 > STACK Bio::SeqFeatureI::attach_seq > /opt/local/lib/perl5/site_perl/5.12.3/Bio/SeqFeatureI.pm:289 > STACK toplevel dnaglyphtest.pl:45 > ---------------------------------------------------------------- > > I made use Bio::Graphics and BioPerl were up to date with CPAN. It works if > I use it as a Bio::SeqFeature::Generic but all the references I've found > says you can use it as a feature. Putting in a workaround here but not > familiar enough with BioPerl to make a patch. > > Could it be something I'm missing as well? > > This is testing code to find where the problem is, ignore how terrible it > looks and how redundant most of it is. It's just me ripping it out to get > to the error(GBrowse doesn't seem to like to give out error messages, but > that's unrelated...) > > http://pastebin.ca/2104560 > > > > Thanks, > --Joseph > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From florian.lajus at inria.fr Fri Jan 20 11:45:27 2012 From: florian.lajus at inria.fr (lajus) Date: Fri, 20 Jan 2012 17:45:27 +0100 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> Message-ID: <4F199A27.8030408@inria.fr> Hi all, As I have said, I've worked on SeqFeatureAdaptor to also persist and retrieve sub-features. My code is in attachment: I have modified the last stable version of : - Bio/DB/BioSQL/SeqFeatureAdaptor.pm - Bio/DB/BIOSQL/SeqAdaptor.pm I have created: - Bio/SeqFeature/SeqFeatureRealtionship.pm - Bio/DB/BioSQL/SeqFeatureRealtionshipAdaptor.pm some tests: - SeqFeatureRealtionship.t (very simple test for the also rally simple SeqFeatureRealtionship class) - SeqFeatureRealtionshipAdaptor.t (test persistent of subfeatures object in database. Need a database with BioSQL to connect with (even if no commit is done)) => I have only tested with a PostGres database ... If you have advices, questions about my implementation or about my tests, don't hesitate to tell me. Is there a way to include my modifications to a future release of BioPerl ? Florian Le 14/01/2012 17:12, Hilmar Lapp a ?crit : > Hi Florian, > > You could do that (and it might have advantages in terms of code separation), but you don't have to. In general, adaptor classes get instantiated by the Bioperl-DB framework when a Bioperl class that is mapped to it needs to get serialized or populated. Since there is no class in Bioperl that would correspond to a seqfeature relationship, those situations won't occur. > > So you could just keep it simple and expand store_children() and correspondingly their retrieval in the adaptor class for seqfeatures. But as hinted above, you may still prefer a separate adaptor class just to keep the nitty-gritty of storing/loading the relationships out of the main adaptor class. Really up to you how you feel more comfortable. > > -hilmar > > Sent with a tap. > > On Jan 13, 2012, at 4:27 AM, lajus wrote: > >> I should write >> - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB/BioSQL >> of course >> >> Le 13/01/2012 10:25, lajus a ?crit : >>> Hi hilmar, >>> >>> Thanks for your hint, but I'm quite lost in the BioPerl architecture (and quite new in perl programming). I'd like to use the handling of term-to-term relationships as a template but I don't find what files are related to this. >>> >>> As far as I understand, I should create: >>> - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB >>> - a new object SeqFeatureRelationship (and its interface) in Bio/Seqfeature >>> - modify SeqFeatureAdaptor to store children (just with a call to subSeqFeature in store_children sub and thanks to my SeqFeatureRelationshipAdaptor create new relationships) >>> - modify SeqFeatureAdaptor to retrieve children ( thanks to my SeqFeatureRelationshipAdaptor create new relationships ) >>> >>> Is it the right way? >>> >>> Florian >>> >>> Le 12/01/2012 18:49, Hilmar Lapp a ?crit : >>>> Hi Florian, >>>> >>>> Thanks for digging this up - this is what I had in memory, but I ran out of time last night in ascertaining that it is indeed still true. >>>> >>>> It'd be awesome if you can add the code to SeqFeatureAdaptor to also persist and retrieve sub-features. I think the object-relational mappings are all there already (in BaseDriver.pm). You could use the handling of bioentry-to-bioentry relationships (or term-to-term relationships) as a template for how to implement this. >>>> >>>> -hilmar >>>> >>>> On Jan 12, 2012, at 4:23 AM, lajus wrote: >>>> >>>>> Ok, I have looked in BioPerl code and it appears that subSeqFeature are not handled yet: >>>>> comment in SeqFeatureAdaptor.pm for store children function (and attach childrenn too): >>>>> "Bio::SeqFeatureI has a location, annotation, and possibly sub-seqfeatures as children. The latter is not implemented yet." >>>>> >>>>> So it's totally normal, if it doesn't work. >>>>> Have you started to implement this stuff, or should I rewrite another SeqFeatureAdaptor which handle this ? >>>>> >>>>> Florian >>>>> >>>>> Le 11/01/2012 16:44, Fields, Christopher J a ?crit : >>>>>> Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? >>>>>> >>>>>> chris >>>>>> >>>>>> On Jan 11, 2012, at 7:09 AM, lajus wrote: >>>>>> >>>>>>> Therefore, if I look in verbose mode, I can see that in the stack I have many : >>>>>>> >>>>>>> no adaptor found for class Bio::Annotation::TypeManager >>>>>>> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory >>>>>>> >>>>>>> Just warning, no errors but... >>>>>>> Any clues? >>>>>>> >>>>>>> Thanks by advance, >>>>>>> >>>>>>> Florian >>>>>>> >>>>>>> Le 11/01/2012 13:43, lajus a ?crit : >>>>>>>> I have looked to the Unflattener and the magic works quite fine. >>>>>>>> Then, the $seq which is given (by side-effect) by >>>>>>>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>>>>>>> has a good hierarchy for us. >>>>>>>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >>>>>>>> But when I create a persitent object for $seq and if I create it: >>>>>>>> $adaptor->create_persistent($seq); >>>>>>>> $pseq->create(); >>>>>>>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >>>>>>>> >>>>>>>> Do you have an explanation? >>>>>>>> >>>>>>>> Florian >>>>>>>> >>>>>>>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>>>>>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>>>>>>> >>>>>>>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>>>>>>>> Hello, >>>>>>>>>>> I am currently working on a refactoring of the Genolevures project >>>>>>>>>>> (http://www.genolevures.org/) >>>>>>>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>>>>>>>> database. >>>>>>>>>>> >>>>>>>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>>>>>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>>>>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>>>>>>> I find it strange in so far as there is a relationship between gene and its >>>>>>>>>>> CDS. right? >>>>>>>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>>>>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>>>>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>>>>>>>> attempts to infer this kind of relationship, and if it did, if that would >>>>>>>>>> get record in the BioSQL tables. >>>>>>>>>> >>>>>>>>>> Peter >>>>>>>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>>>>>>>> >>>>>>>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: SeqFeatureRelationship.pm Type: application/x-perl Size: 852 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SeqFeatureRelationshipAdaptor.pm Type: application/x-perl Size: 13000 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SeqFeatureAdaptor.pm Type: application/x-perl Size: 25746 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SeqAdaptor.pm Type: application/x-perl Size: 16631 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seqFeatureRelationshipAdaptor.t Type: text/troff Size: 9124 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: SeqFeatureRelationShip.t.t Type: text/troff Size: 744 bytes Desc: not available URL: From hlapp at drycafe.net Fri Jan 20 13:06:22 2012 From: hlapp at drycafe.net (Hilmar Lapp) Date: Fri, 20 Jan 2012 13:06:22 -0500 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> <4F199A27.8030408@inria.fr> <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> Message-ID: <6BCB94E6-E5FA-4704-BE49-CA99312E4516@drycafe.net> Florian - I'll add that aside from being better aligned with our procedures for integrating code contributions, this would also make it easier for us and more recognizable for everyone to attribute these changes to you, because the git commit logs will do this rather than it being buried in a commit message string. So there's an actual benefit for you as the contributor. -hilmar On Jan 20, 2012, at 11:50 AM, Fields, Christopher J wrote: > Florian, > > Re: patches, we can accept these, but I would like to point out the code is publicly available on github: > > https://github.com/bioperl/bioperl-db > > The fastest way to contribute is to create a github account, fork the code from that repository, checkout a local copy using git, then push the changes back to your fork so they are not lost. You can then submit a pull request that should appear on the bioperl developers mailing list, where one of us can simply (via the github interface) merge your changes in. > > chris > > On Jan 20, 2012, at 10:45 AM, lajus wrote: > >> Hi all, >> >> As I have said, I've worked on SeqFeatureAdaptor to also persist and retrieve sub-features. >> My code is in attachment: >> >> I have modified the last stable version of : >> - Bio/DB/BioSQL/SeqFeatureAdaptor.pm >> - Bio/DB/BIOSQL/SeqAdaptor.pm >> I have created: >> - Bio/SeqFeature/SeqFeatureRealtionship.pm >> - Bio/DB/BioSQL/SeqFeatureRealtionshipAdaptor.pm >> some tests: >> - SeqFeatureRealtionship.t (very simple test for the also rally simple SeqFeatureRealtionship class) >> - SeqFeatureRealtionshipAdaptor.t (test persistent of subfeatures object in database. Need a database with BioSQL to connect with (even if no commit is done)) => I have only tested with a PostGres database ... >> >> If you have advices, questions about my implementation or about my tests, don't hesitate to tell me. >> >> Is there a way to include my modifications to a future release of BioPerl ? >> >> Florian >> >> Le 14/01/2012 17:12, Hilmar Lapp a ?crit : >>> Hi Florian, >>> >>> You could do that (and it might have advantages in terms of code separation), but you don't have to. In general, adaptor classes get instantiated by the Bioperl-DB framework when a Bioperl class that is mapped to it needs to get serialized or populated. Since there is no class in Bioperl that would correspond to a seqfeature relationship, those situations won't occur. >>> >>> So you could just keep it simple and expand store_children() and correspondingly their retrieval in the adaptor class for seqfeatures. But as hinted above, you may still prefer a separate adaptor class just to keep the nitty-gritty of storing/loading the relationships out of the main adaptor class. Really up to you how you feel more comfortable. >>> >>> -hilmar >>> >>> Sent with a tap. >>> >>> On Jan 13, 2012, at 4:27 AM, lajus wrote: >>> >>>> I should write >>>> - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB/BioSQL >>>> of course >>>> >>>> Le 13/01/2012 10:25, lajus a ?crit : >>>>> Hi hilmar, >>>>> >>>>> Thanks for your hint, but I'm quite lost in the BioPerl architecture (and quite new in perl programming). I'd like to use the handling of term-to-term relationships as a template but I don't find what files are related to this. >>>>> >>>>> As far as I understand, I should create: >>>>> - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB >>>>> - a new object SeqFeatureRelationship (and its interface) in Bio/Seqfeature >>>>> - modify SeqFeatureAdaptor to store children (just with a call to subSeqFeature in store_children sub and thanks to my SeqFeatureRelationshipAdaptor create new relationships) >>>>> - modify SeqFeatureAdaptor to retrieve children ( thanks to my SeqFeatureRelationshipAdaptor create new relationships ) >>>>> >>>>> Is it the right way? >>>>> >>>>> Florian >>>>> >>>>> Le 12/01/2012 18:49, Hilmar Lapp a ?crit : >>>>>> Hi Florian, >>>>>> >>>>>> Thanks for digging this up - this is what I had in memory, but I ran out of time last night in ascertaining that it is indeed still true. >>>>>> >>>>>> It'd be awesome if you can add the code to SeqFeatureAdaptor to also persist and retrieve sub-features. I think the object-relational mappings are all there already (in BaseDriver.pm). You could use the handling of bioentry-to-bioentry relationships (or term-to-term relationships) as a template for how to implement this. >>>>>> >>>>>> -hilmar >>>>>> >>>>>> On Jan 12, 2012, at 4:23 AM, lajus wrote: >>>>>> >>>>>>> Ok, I have looked in BioPerl code and it appears that subSeqFeature are not handled yet: >>>>>>> comment in SeqFeatureAdaptor.pm for store children function (and attach childrenn too): >>>>>>> "Bio::SeqFeatureI has a location, annotation, and possibly sub-seqfeatures as children. The latter is not implemented yet." >>>>>>> >>>>>>> So it's totally normal, if it doesn't work. >>>>>>> Have you started to implement this stuff, or should I rewrite another SeqFeatureAdaptor which handle this ? >>>>>>> >>>>>>> Florian >>>>>>> >>>>>>> Le 11/01/2012 16:44, Fields, Christopher J a ?crit : >>>>>>>> Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? >>>>>>>> >>>>>>>> chris >>>>>>>> >>>>>>>> On Jan 11, 2012, at 7:09 AM, lajus wrote: >>>>>>>> >>>>>>>>> Therefore, if I look in verbose mode, I can see that in the stack I have many : >>>>>>>>> >>>>>>>>> no adaptor found for class Bio::Annotation::TypeManager >>>>>>>>> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory >>>>>>>>> >>>>>>>>> Just warning, no errors but... >>>>>>>>> Any clues? >>>>>>>>> >>>>>>>>> Thanks by advance, >>>>>>>>> >>>>>>>>> Florian >>>>>>>>> >>>>>>>>> Le 11/01/2012 13:43, lajus a ?crit : >>>>>>>>>> I have looked to the Unflattener and the magic works quite fine. >>>>>>>>>> Then, the $seq which is given (by side-effect) by >>>>>>>>>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>>>>>>>>> has a good hierarchy for us. >>>>>>>>>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >>>>>>>>>> But when I create a persitent object for $seq and if I create it: >>>>>>>>>> $adaptor->create_persistent($seq); >>>>>>>>>> $pseq->create(); >>>>>>>>>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >>>>>>>>>> >>>>>>>>>> Do you have an explanation? >>>>>>>>>> >>>>>>>>>> Florian >>>>>>>>>> >>>>>>>>>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>>>>>>>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>>>>>>>>> >>>>>>>>>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> I am currently working on a refactoring of the Genolevures project >>>>>>>>>>>>> (http://www.genolevures.org/) >>>>>>>>>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>>>>>>>>>> database. >>>>>>>>>>>>> >>>>>>>>>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>>>>>>>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>>>>>>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>>>>>>>>> I find it strange in so far as there is a relationship between gene and its >>>>>>>>>>>>> CDS. right? >>>>>>>>>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>>>>>>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>>>>>>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>>>>>>>>>> attempts to infer this kind of relationship, and if it did, if that would >>>>>>>>>>>> get record in the BioSQL tables. >>>>>>>>>>>> >>>>>>>>>>>> Peter >>>>>>>>>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>>>>>>>>>> >>>>>>>>>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>>>>>>>>> >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net : =========================================================== From guhli007 at umn.edu Fri Jan 20 12:51:36 2012 From: guhli007 at umn.edu (Joseph Guhlin) Date: Fri, 20 Jan 2012 11:51:36 -0600 Subject: [Bioperl-l] Strange Error, Bio::Graphics::Feature attach_seq is not implemented In-Reply-To: References: Message-ID: Hi, thanks for the response. I'm trying to create a plugin that uses the dna glyph(which says you can use a B::G::Feature in its POD) and then I'm going to expand upon it. I'm guessing it is an oversight and have implemented them in another object for now and just inheriting that class. I'm still new to gbrowse2 but when I tried to attach a Bio::SeqFeature::Generic object to a featurelist it would crash at that point, so I went back to using the Bio::Graphics::Feature objects and it worked again(using other glyphs, experimenting around and learning). This might be off topic but any ideas around? I wanted to see if I did something weird that was causing it to die(something not installed, misconfigured). Best, --Joseph On Fri, Jan 20, 2012 at 11:29 AM, Scott Cain wrote: > Hi Joseph, > > I agree that Bio::Graphics::Feature doesn't have an attach_seq method, > nor does Bio::SeqFeature::Lite that it inherits from. I don't know > why they don't--it might be a design decision, or it could just be an > oversight. In the mean time, is there a reason you couldn't use > Bio::SeqFeature::Generic? The only place in the Bio::Graphics or > GBrowse code base that the attach_seq method is used is in > Bio::Graphics::Glyph::dna, and it uses a Bio::SeqFeature::Generic. > > Scott > > > On Fri, Jan 20, 2012 at 11:48 AM, Joseph Guhlin wrote: > > Is this the right place to send it? I couldn't find anything in the > > archives. I've got a simple plugin I'm working on for gbrowse, and have > > been having trouble. I removed it from gbrowse completely, and am running > > the script. I get this error: > > > > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > > MSG: Abstract method "Bio::SeqFeatureI::attach_seq" is not implemented by > > package Bio::Graphics::Feature. > > This is not your fault - author of Bio::Graphics::Feature should be > blamed! > > > > STACK Bio::Root::RootI::throw_not_implemented > > /opt/local/lib/perl5/site_perl/5.12.3/Bio/Root/RootI.pm:748 > > STACK Bio::SeqFeatureI::attach_seq > > /opt/local/lib/perl5/site_perl/5.12.3/Bio/SeqFeatureI.pm:289 > > STACK toplevel dnaglyphtest.pl:45 > > ---------------------------------------------------------------- > > > > I made use Bio::Graphics and BioPerl were up to date with CPAN. It works > if > > I use it as a Bio::SeqFeature::Generic but all the references I've found > > says you can use it as a feature. Putting in a workaround here but not > > familiar enough with BioPerl to make a patch. > > > > Could it be something I'm missing as well? > > > > This is testing code to find where the problem is, ignore how terrible it > > looks and how redundant most of it is. It's just me ripping it out to get > > to the error(GBrowse doesn't seem to like to give out error messages, but > > that's unrelated...) > > > > http://pastebin.ca/2104560 > > > > > > > > Thanks, > > --Joseph > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From scott at scottcain.net Fri Jan 20 13:28:12 2012 From: scott at scottcain.net (Scott Cain) Date: Fri, 20 Jan 2012 13:28:12 -0500 Subject: [Bioperl-l] Strange Error, Bio::Graphics::Feature attach_seq is not implemented In-Reply-To: References: Message-ID: Hi Joseph, I'm guessing Bio::SeqFeature::Lite could be modified fairly easily to add an attach_seq method; unfortunately I don't have the time to do that today. Perhaps we should look at the nature of the crash when you're using Bio::SeqFeature::Generic; maybe that will be easier to solve. Scott On Fri, Jan 20, 2012 at 12:51 PM, Joseph Guhlin wrote: > Hi, thanks for the response. > > I'm trying to create a plugin that uses the dna glyph(which says you can use > a B::G::Feature in its POD) and then I'm going to expand upon it. I'm > guessing it is an oversight and have implemented them in another object for > now and just inheriting that class. > > I'm still new to gbrowse2 but when I tried to attach a > Bio::SeqFeature::Generic object to a featurelist it would crash at that > point, so I went back to using the Bio::Graphics::Feature objects and it > worked again(using other glyphs, experimenting around and learning). This > might be off topic but any ideas around? > > I wanted to see if I did something weird that was causing it to > die(something not installed, misconfigured). > > Best, > --Joseph > > > On Fri, Jan 20, 2012 at 11:29 AM, Scott Cain wrote: >> >> Hi Joseph, >> >> I agree that Bio::Graphics::Feature doesn't have an attach_seq method, >> nor does Bio::SeqFeature::Lite that it inherits from. ?I don't know >> why they don't--it might be a design decision, or it could just be an >> oversight. ?In the mean time, is there a reason you couldn't use >> Bio::SeqFeature::Generic? ?The only place in the Bio::Graphics or >> GBrowse code base that the attach_seq method is used is in >> Bio::Graphics::Glyph::dna, and it uses a Bio::SeqFeature::Generic. >> >> Scott >> >> >> On Fri, Jan 20, 2012 at 11:48 AM, Joseph Guhlin wrote: >> > Is this the right place to send it? I couldn't find anything in the >> > archives. I've got a simple plugin I'm working on for gbrowse, and have >> > been having trouble. I removed it from gbrowse completely, and am >> > running >> > the script. I get this error: >> > >> > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >> > MSG: Abstract method "Bio::SeqFeatureI::attach_seq" is not implemented >> > by >> > package Bio::Graphics::Feature. >> > This is not your fault - author of Bio::Graphics::Feature should be >> > blamed! >> > >> > STACK Bio::Root::RootI::throw_not_implemented >> > /opt/local/lib/perl5/site_perl/5.12.3/Bio/Root/RootI.pm:748 >> > STACK Bio::SeqFeatureI::attach_seq >> > /opt/local/lib/perl5/site_perl/5.12.3/Bio/SeqFeatureI.pm:289 >> > STACK toplevel dnaglyphtest.pl:45 >> > ---------------------------------------------------------------- >> > >> > I made use Bio::Graphics and BioPerl were up to date with CPAN. It works >> > if >> > I use it as a Bio::SeqFeature::Generic but all the references I've found >> > says you can use it as a feature. Putting in a workaround here but not >> > familiar enough with BioPerl to make a patch. >> > >> > Could it be something I'm missing as well? >> > >> > This is testing code to find where the problem is, ignore how terrible >> > it >> > looks and how redundant most of it is. It's just me ripping it out to >> > get >> > to the error(GBrowse doesn't seem to like to give out error messages, >> > but >> > that's unrelated...) >> > >> > http://pastebin.ca/2104560 >> > >> > >> > >> > Thanks, >> > --Joseph >> > _______________________________________________ >> > Bioperl-l mailing list >> > Bioperl-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain >> dot net >> GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 >> Ontario Institute for Cancer Research > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From guhli007 at umn.edu Fri Jan 20 13:41:15 2012 From: guhli007 at umn.edu (Joseph Guhlin) Date: Fri, 20 Jan 2012 12:41:15 -0600 Subject: [Bioperl-l] Strange Error, Bio::Graphics::Feature attach_seq is not implemented In-Reply-To: References: Message-ID: Thanks for the offer. I think since I am going to be re-working the dna glyph for my purposes I'm just going to use a simple work around for now. I do appreciate the offer of help though. If I revisit it in the future I'll try to look at it, it could easily be a crash due to my own newness to the bioperl/gbrowse system. Best, --Joseph On Fri, Jan 20, 2012 at 12:28 PM, Scott Cain wrote: > Hi Joseph, > > I'm guessing Bio::SeqFeature::Lite could be modified fairly easily to > add an attach_seq method; unfortunately I don't have the time to do > that today. > > Perhaps we should look at the nature of the crash when you're using > Bio::SeqFeature::Generic; maybe that will be easier to solve. > > Scott > > > On Fri, Jan 20, 2012 at 12:51 PM, Joseph Guhlin wrote: > > Hi, thanks for the response. > > > > I'm trying to create a plugin that uses the dna glyph(which says you can > use > > a B::G::Feature in its POD) and then I'm going to expand upon it. I'm > > guessing it is an oversight and have implemented them in another object > for > > now and just inheriting that class. > > > > I'm still new to gbrowse2 but when I tried to attach a > > Bio::SeqFeature::Generic object to a featurelist it would crash at that > > point, so I went back to using the Bio::Graphics::Feature objects and it > > worked again(using other glyphs, experimenting around and learning). This > > might be off topic but any ideas around? > > > > I wanted to see if I did something weird that was causing it to > > die(something not installed, misconfigured). > > > > Best, > > --Joseph > > > > > > On Fri, Jan 20, 2012 at 11:29 AM, Scott Cain > wrote: > >> > >> Hi Joseph, > >> > >> I agree that Bio::Graphics::Feature doesn't have an attach_seq method, > >> nor does Bio::SeqFeature::Lite that it inherits from. I don't know > >> why they don't--it might be a design decision, or it could just be an > >> oversight. In the mean time, is there a reason you couldn't use > >> Bio::SeqFeature::Generic? The only place in the Bio::Graphics or > >> GBrowse code base that the attach_seq method is used is in > >> Bio::Graphics::Glyph::dna, and it uses a Bio::SeqFeature::Generic. > >> > >> Scott > >> > >> > >> On Fri, Jan 20, 2012 at 11:48 AM, Joseph Guhlin > wrote: > >> > Is this the right place to send it? I couldn't find anything in the > >> > archives. I've got a simple plugin I'm working on for gbrowse, and > have > >> > been having trouble. I removed it from gbrowse completely, and am > >> > running > >> > the script. I get this error: > >> > > >> > ------------- EXCEPTION: Bio::Root::NotImplemented ------------- > >> > MSG: Abstract method "Bio::SeqFeatureI::attach_seq" is not implemented > >> > by > >> > package Bio::Graphics::Feature. > >> > This is not your fault - author of Bio::Graphics::Feature should be > >> > blamed! > >> > > >> > STACK Bio::Root::RootI::throw_not_implemented > >> > /opt/local/lib/perl5/site_perl/5.12.3/Bio/Root/RootI.pm:748 > >> > STACK Bio::SeqFeatureI::attach_seq > >> > /opt/local/lib/perl5/site_perl/5.12.3/Bio/SeqFeatureI.pm:289 > >> > STACK toplevel dnaglyphtest.pl:45 > >> > ---------------------------------------------------------------- > >> > > >> > I made use Bio::Graphics and BioPerl were up to date with CPAN. It > works > >> > if > >> > I use it as a Bio::SeqFeature::Generic but all the references I've > found > >> > says you can use it as a feature. Putting in a workaround here but not > >> > familiar enough with BioPerl to make a patch. > >> > > >> > Could it be something I'm missing as well? > >> > > >> > This is testing code to find where the problem is, ignore how terrible > >> > it > >> > looks and how redundant most of it is. It's just me ripping it out to > >> > get > >> > to the error(GBrowse doesn't seem to like to give out error messages, > >> > but > >> > that's unrelated...) > >> > > >> > http://pastebin.ca/2104560 > >> > > >> > > >> > > >> > Thanks, > >> > --Joseph > >> > _______________________________________________ > >> > Bioperl-l mailing list > >> > Bioperl-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > >> -- > >> ------------------------------------------------------------------------ > >> Scott Cain, Ph. D. scott at scottcain > >> dot net > >> GMOD Coordinator (http://gmod.org/) 216-392-3087 > >> Ontario Institute for Cancer Research > > > > > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > From cjfields at illinois.edu Fri Jan 20 15:02:06 2012 From: cjfields at illinois.edu (Fields, Christopher J) Date: Fri, 20 Jan 2012 20:02:06 +0000 Subject: [Bioperl-l] Strange Error, Bio::Graphics::Feature attach_seq is not implemented In-Reply-To: References: Message-ID: <342D9175-21FF-4A4D-911B-5545E119D67A@illinois.edu> Scott, I think the problem is that the original attach_seq() interface method (and Bio::SF::Generic) assumed an in-memory generic Bio::Seq/PrimarySeq, but the design for Bio::SF::Lite and Bio::Graphics::Feature I believe assumed such data are not stored as an in-memory Bio::SeqI, but in a database or a file. It's possible to implement this for the basic non-persistent cases mentioned but override the method, initially as unimplemented, in child classes where a different behavior is expected (Bio::DB::SeqFeature for example). (Hope that makes sense?) chris On Jan 20, 2012, at 12:28 PM, Scott Cain wrote: > Hi Joseph, > > I'm guessing Bio::SeqFeature::Lite could be modified fairly easily to > add an attach_seq method; unfortunately I don't have the time to do > that today. > > Perhaps we should look at the nature of the crash when you're using > Bio::SeqFeature::Generic; maybe that will be easier to solve. > > Scott > > > On Fri, Jan 20, 2012 at 12:51 PM, Joseph Guhlin wrote: >> Hi, thanks for the response. >> >> I'm trying to create a plugin that uses the dna glyph(which says you can use >> a B::G::Feature in its POD) and then I'm going to expand upon it. I'm >> guessing it is an oversight and have implemented them in another object for >> now and just inheriting that class. >> >> I'm still new to gbrowse2 but when I tried to attach a >> Bio::SeqFeature::Generic object to a featurelist it would crash at that >> point, so I went back to using the Bio::Graphics::Feature objects and it >> worked again(using other glyphs, experimenting around and learning). This >> might be off topic but any ideas around? >> >> I wanted to see if I did something weird that was causing it to >> die(something not installed, misconfigured). >> >> Best, >> --Joseph >> >> >> On Fri, Jan 20, 2012 at 11:29 AM, Scott Cain wrote: >>> >>> Hi Joseph, >>> >>> I agree that Bio::Graphics::Feature doesn't have an attach_seq method, >>> nor does Bio::SeqFeature::Lite that it inherits from. I don't know >>> why they don't--it might be a design decision, or it could just be an >>> oversight. In the mean time, is there a reason you couldn't use >>> Bio::SeqFeature::Generic? The only place in the Bio::Graphics or >>> GBrowse code base that the attach_seq method is used is in >>> Bio::Graphics::Glyph::dna, and it uses a Bio::SeqFeature::Generic. >>> >>> Scott >>> >>> >>> On Fri, Jan 20, 2012 at 11:48 AM, Joseph Guhlin wrote: >>>> Is this the right place to send it? I couldn't find anything in the >>>> archives. I've got a simple plugin I'm working on for gbrowse, and have >>>> been having trouble. I removed it from gbrowse completely, and am >>>> running >>>> the script. I get this error: >>>> >>>> ------------- EXCEPTION: Bio::Root::NotImplemented ------------- >>>> MSG: Abstract method "Bio::SeqFeatureI::attach_seq" is not implemented >>>> by >>>> package Bio::Graphics::Feature. >>>> This is not your fault - author of Bio::Graphics::Feature should be >>>> blamed! >>>> >>>> STACK Bio::Root::RootI::throw_not_implemented >>>> /opt/local/lib/perl5/site_perl/5.12.3/Bio/Root/RootI.pm:748 >>>> STACK Bio::SeqFeatureI::attach_seq >>>> /opt/local/lib/perl5/site_perl/5.12.3/Bio/SeqFeatureI.pm:289 >>>> STACK toplevel dnaglyphtest.pl:45 >>>> ---------------------------------------------------------------- >>>> >>>> I made use Bio::Graphics and BioPerl were up to date with CPAN. It works >>>> if >>>> I use it as a Bio::SeqFeature::Generic but all the references I've found >>>> says you can use it as a feature. Putting in a workaround here but not >>>> familiar enough with BioPerl to make a patch. >>>> >>>> Could it be something I'm missing as well? >>>> >>>> This is testing code to find where the problem is, ignore how terrible >>>> it >>>> looks and how redundant most of it is. It's just me ripping it out to >>>> get >>>> to the error(GBrowse doesn't seem to like to give out error messages, >>>> but >>>> that's unrelated...) >>>> >>>> http://pastebin.ca/2104560 >>>> >>>> >>>> >>>> Thanks, >>>> --Joseph >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> -- >>> ------------------------------------------------------------------------ >>> Scott Cain, Ph. D. scott at scottcain >>> dot net >>> GMOD Coordinator (http://gmod.org/) 216-392-3087 >>> Ontario Institute for Cancer Research >> >> > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shalabh.sharma7 at gmail.com Fri Jan 20 15:43:21 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 20 Jan 2012 15:43:21 -0500 Subject: [Bioperl-l] parsing kegg 'ko' file Message-ID: Hi All, I am trying to parse kegg 'ko' file. I just need three information out of it. name, entry and description, so i just wrote this small code: #!/usr/bin/perl -w use Bio::SeqIO; my $in = Bio::SeqIO->new(-file => "ko", -format => 'KEGG'); while(my $seq = $in->next_seq){ print $seq->pimary_id, "\t" ; print $seq->display_id, "\t"; print $seq->annotation->get_Annotations('description') ,"\n"; } But when i tried to run this i got this error: Can't locate object method "pimary_id" via package "Bio::Seq::RichSeq" at parseKO_bioperl.pl line 5, line 1. I am using bioperl -v 1.006001 Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From joseph.karalius at monsanto.com Fri Jan 20 16:07:03 2012 From: joseph.karalius at monsanto.com (KARALIUS, JOSEPH (AG/2401)) Date: Fri, 20 Jan 2012 21:07:03 +0000 Subject: [Bioperl-l] parsing kegg 'ko' file In-Reply-To: References: Message-ID: 'pimary_id' should be 'primary_id'. Cheers, Joey -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of shalabh sharma Sent: Friday, January 20, 2012 12:43 PM To: bioperl-l Subject: [Bioperl-l] parsing kegg 'ko' file Hi All, I am trying to parse kegg 'ko' file. I just need three information out of it. name, entry and description, so i just wrote this small code: #!/usr/bin/perl -w use Bio::SeqIO; my $in = Bio::SeqIO->new(-file => "ko", -format => 'KEGG'); while(my $seq = $in->next_seq){ print $seq->pimary_id, "\t" ; print $seq->display_id, "\t"; print $seq->annotation->get_Annotations('description') ,"\n"; } But when i tried to run this i got this error: Can't locate object method "pimary_id" via package "Bio::Seq::RichSeq" at parseKO_bioperl.pl line 5, line 1. I am using bioperl -v 1.006001 Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. From shalabh.sharma7 at gmail.com Fri Jan 20 16:22:30 2012 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Fri, 20 Jan 2012 16:22:30 -0500 Subject: [Bioperl-l] parsing kegg 'ko' file In-Reply-To: References: Message-ID: Hi Joseph, Thanks a lot, i am feeling so stupid right now.. But when i again ran that code i got following output: Bio::PrimarySeq=HASH(0x9be888) E3.2.1.1, amyA, malS Bio::Annotation::Comment=HASH(0x9c2484) Bio::PrimarySeq=HASH(0x9be930) E3.2.1.2 Bio::Annotation::Comment=HASH(0x9be8c4) How come i am getting object ? Only name filed is fine. I would really appreciate your help. Thanks Shalabh On Fri, Jan 20, 2012 at 4:07 PM, KARALIUS, JOSEPH (AG/2401) < joseph.karalius at monsanto.com> wrote: > 'pimary_id' should be 'primary_id'. > > Cheers, > Joey > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto: > bioperl-l-bounces at lists.open-bio.org] On Behalf Of shalabh sharma > Sent: Friday, January 20, 2012 12:43 PM > To: bioperl-l > Subject: [Bioperl-l] parsing kegg 'ko' file > > Hi All, > I am trying to parse kegg 'ko' file. I just need three information > out of it. > name, entry and description, so i just wrote this small code: > > #!/usr/bin/perl -w > use Bio::SeqIO; > my $in = Bio::SeqIO->new(-file => "ko", -format => 'KEGG'); > while(my $seq = $in->next_seq){ > print $seq->pimary_id, "\t" ; > print $seq->display_id, "\t"; > print $seq->annotation->get_Annotations('description') ,"\n"; > } > > But when i tried to run this i got this error: > Can't locate object method "pimary_id" via package "Bio::Seq::RichSeq" at > parseKO_bioperl.pl line 5, line 1. > I am using bioperl -v 1.006001 > > Thanks > Shalabh > > -- > Shalabh Sharma > Scientific Computing Professional Associate (Bioinformatics Specialist) > Department of Marine Sciences > University of Georgia > Athens, GA 30602-3636 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > This e-mail message may contain privileged and/or confidential > information, and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other > use of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring, > reading and archival by Monsanto, including its > subsidiaries. The recipient of this e-mail is solely responsible for > checking for the presence of "Viruses" or other "Malware". > Monsanto, along with its subsidiaries, accepts no liability for any damage > caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > > The information contained in this email may be subject to the export > control laws and regulations of the United States, potentially > including but not limited to the Export Administration Regulations (EAR) > and sanctions regulations issued by the U.S. Department of > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this > information you are obligated to comply with all > applicable U.S. export laws and regulations. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 From joseph.karalius at monsanto.com Fri Jan 20 16:57:28 2012 From: joseph.karalius at monsanto.com (KARALIUS, JOSEPH (AG/2401)) Date: Fri, 20 Jan 2012 21:57:28 +0000 Subject: [Bioperl-l] parsing kegg 'ko' file In-Reply-To: References: Message-ID: Shalabh, Your print statement should look something like this (not tested). my $ann = $seq->annotation(); foreach my $desc ($ann->get_Annotations('description')) { print $desc->title, "\n"; } See 'perldoc Bio::Seq'. Joey From: shalabh sharma [mailto:shalabh.sharma7 at gmail.com] Sent: Friday, January 20, 2012 1:23 PM To: KARALIUS, JOSEPH [AG/2401] Cc: bioperl-l Subject: Re: [Bioperl-l] parsing kegg 'ko' file Hi Joseph, Thanks a lot, i am feeling so stupid right now.. But when i again ran that code i got following output: Bio::PrimarySeq=HASH(0x9be888) E3.2.1.1, amyA, malS Bio::Annotation::Comment=HASH(0x9c2484) Bio::PrimarySeq=HASH(0x9be930) E3.2.1.2 Bio::Annotation::Comment=HASH(0x9be8c4) How come i am getting object ? Only name filed is fine. I would really appreciate your help. Thanks Shalabh On Fri, Jan 20, 2012 at 4:07 PM, KARALIUS, JOSEPH (AG/2401) > wrote: 'pimary_id' should be 'primary_id'. Cheers, Joey -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of shalabh sharma Sent: Friday, January 20, 2012 12:43 PM To: bioperl-l Subject: [Bioperl-l] parsing kegg 'ko' file Hi All, I am trying to parse kegg 'ko' file. I just need three information out of it. name, entry and description, so i just wrote this small code: #!/usr/bin/perl -w use Bio::SeqIO; my $in = Bio::SeqIO->new(-file => "ko", -format => 'KEGG'); while(my $seq = $in->next_seq){ print $seq->pimary_id, "\t" ; print $seq->display_id, "\t"; print $seq->annotation->get_Annotations('description') ,"\n"; } But when i tried to run this i got this error: Can't locate object method "pimary_id" via package "Bio::Seq::RichSeq" at parseKO_bioperl.pl line 5, line 1. I am using bioperl -v 1.006001 Thanks Shalabh -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Shalabh Sharma Scientific Computing Professional Associate (Bioinformatics Specialist) Department of Marine Sciences University of Georgia Athens, GA 30602-3636 This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited. All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware". Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying this e-mail or any attachment. The information contained in this email may be subject to the export control laws and regulations of the United States, potentially including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this information you are obligated to comply with all applicable U.S. export laws and regulations. From pawan.mani2 at gmail.com Tue Jan 24 06:31:35 2012 From: pawan.mani2 at gmail.com (kakchingtabam pawankumar sharma) Date: Tue, 24 Jan 2012 17:01:35 +0530 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: <4F1443A8.6000007@sanger.ac.uk> References: <4F10A59E.5040807@sanger.ac.uk> <4F13EE93.2010502@sanger.ac.uk> <4F1443A8.6000007@sanger.ac.uk> Message-ID: Hi, Thanks a lot for help Frank. It works for every Blast output. One more question is that i want to best hit only(top hit of every query). I show there is option called $obj->best_hit_only; in Bio::SearchIO module. So help to add this to my script. I could not do. Its confusing. Thanks in Advanced. With best regards, Pawan On Mon, Jan 16, 2012 at 9:05 PM, Frank Schwach wrote: > Excellent, well done! > No, this is the way to do it. In BioPerl modules that use strand information > you will find the values +1/-1 or undef. If you want to display those as > PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy whatever, you have to convert it, > but you know now how to do it. > You have a syntax error in your code where you retrieve the query name: > > > my $QueryName = $result->query_name(), my $QueryDescript = > $result->query_description(); > > should be two lines and the comma should be a semicolon. > > Good luck! > > Frank > > > > > > On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote: >> >> So By using the if else conditon function, I have solve Frank. >> I mean is there anyway in bioperl we can get directly using other >> module! I hope u got it! >> >> >> So my second Question have not replied that is >> >> i have blastn report as below: >> >> BLASTN 2.2.18 [Mar-02-2008] >> >> >> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, >> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >> programs", ?Nucleic Acids Res. 25:3389-3402. >> >> Query= ORB_1210001_hsa-miR-548aa#5_1 >> ? ? ? ? (24 letters) >> >> Database: hsa-mmu-rno_miRNA.fa >> ? ? ? ? ? 3524 sequences; 76,424 total letters >> >> Searching..................................................done >> >> >> >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Score ? ?E >> Sequences producing significant alignments: ? ? ? ? ? ? ? ? ? ? ?(bits) >> Value >> >> hsa-miR-548aa >> 48 ? 2e-009 >> hsa-miR-548d-5p >> 36 ? 9e-006 >> hsa-miR-548b-5p >> 36 ? 9e-006 >> hsa-miR-548z >> 34 ? 3e-005 >> hsa-miR-548q >> 30 ? 5e-004 >> hsa-miR-548n >> 30 ? 5e-004 >> hsa-miR-548ab >> 28 ? 0.002 >> hsa-miR-548v >> 28 ? 0.002 >> hsa-miR-548c-5p >> 28 ? 0.002 >> hsa-miR-548ag >> 26 ? 0.008 >> hsa-miR-548u >> 26 ? 0.008 >> hsa-miR-548c-3p >> 26 ? 0.008 >> hsa-miR-603 >> 26 ? 0.008 >> hsa-miR-548a-3p >> 26 ? 0.008 >> hsa-miR-548ac >> 24 ? 0.033 >> hsa-miR-548an >> 22 ? 0.13 >> hsa-miR-548aj >> 22 ? 0.13 >> hsa-miR-548i >> 22 ? 0.13 >> hsa-miR-548g >> 22 ? 0.13 >> hsa-miR-548j >> 22 ? 0.13 >> hsa-miR-548a-5p >> 22 ? 0.13 >> >>> hsa-miR-548aa >> >> ? ? ? ? ?Length = 25 >> >> ?Score = 48.1 bits (24), Expect = 2e-009 >> ?Identities = 24/24 (100%) >> ?Strand = Plus / Minus >> >> >> Query: 1 ?tggtgcaaaagtaattgtggtttt 24 >> ? ? ? ? ?|||||||||||||||||||||||| >> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >> >> >>> hsa-miR-548d-5p >> >> ? ? ? ? ?Length = 22 >> >> ?Score = 36.2 bits (18), Expect = 9e-006 >> ?Identities = 18/18 (100%) >> ?Strand = Plus / Plus >> >> >> Query: 7 ?aaaagtaattgtggtttt 24 >> ? ? ? ? ?|||||||||||||||||| >> Sbjct: 1 ?aaaagtaattgtggtttt 18 >> >> >> >> in this result i could not parse my code. i think my code does not >> accept the Query header that is >> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >> output. >> >> kindly help me out. >> >> with regards, >> Pawan. >> >> On 1/16/12, Frank Schwach ?wrote: >>> >>> Hi Pawan , >>> >>> Please always "reply to all", so that you keep the discussion on the >>> bioperl mailing list and more people can help you. >>> What you need is a very basic Perl command. I could give you the code >>> but I think you get more out of it if you experiment with it on your own >>> because it is very fundamental. I'll point you in the right direction: >>> you want an if-then-else conditional construct. >>> >>> Perl's documentation about this is here: >>> http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs >>> >>> if strand is 1 you want to print "PLUS" else if it is -1 you want to >>> print "MINUS", or else you might want to print "no strand" or something, >>> or even treat it as an error and make the script abort. >>> >>> Give it a go and let us know if you need help. For basic (non-bio) Perl >>> question, please also check out the community at >>> http://www.perlmonks.org/. >>> >>> Hope that helps, >>> >>> Frank >>> >>> >>> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote: >>>> >>>> Hi frank, >>>> >>>> Thanks for your kind reply. >>>> I could get the vale for query as 1 value if it is plus. >>>> and for hit = -1 if it is minus. >>>> But i would like to print out as PLUS or MINUS not 1 or -1 my friend. >>>> >>>> you can see my code as below: >>>> >>>> while ( my $result = $searchio->next_result() ) { >>>> ? ? ?my $QueryName = $result->query_name(), my $QueryDescript = >>>> $result->query_description(); >>>> ? ? ?my $QueryLength = $result->query_length; >>>> ? ? ?my $NoHits = $result->num_hits; >>>> >>>> ? ? ?while( my $hit = $result->next_hit ) { >>>> ? ? ? ? ?my $HitName = $hit->name(); >>>> ? ? ? ? ?my $HitDescrip = $hit->description(); >>>> ? ? ? ?my $HitLength = $hit->length; >>>> ? ? ? ? ?my $Score = $hit->raw_score(); >>>> ? ? ? ?my $Bits = $hit->bits; >>>> >>>> ? ? ? ? ?my $hsp = $hit->next_hsp; # Only check first (= best) hsp >>>> ? ? ? ?my $Evalue = ?$hsp->evalue(); >>>> ? ? ? ?my $AlnLen = $hsp->num_identical(); >>>> ? ? ? ?my $TotalLen = $hsp->hsp_length; >>>> ? ? ? ?my $QueryStrand = $hsp->strand('query'); >>>> ? ? ? ?my $HitStrand = $hsp->strand('hit'); >>>> >>>> ? ? ? ?if($Evalue< ? $cutoff){ >>>> ? ? ? ? ? ?print "$QueryName $QueryDescript\t"; >>>> ? ? ? ? ? ?print "$QueryLength\t"; >>>> ? ? ? ? ? ?print "$NoHits\t"; >>>> ? ? ? ? ? ?print "$HitName $HitDescrip\t"; >>>> ? ? ? ? ? ?print "$HitLength\t"; >>>> ? ? ? ? ? ?print "$Score\t"; >>>> ? ? ? ? ? ?print "$Bits\t"; >>>> ? ? ? ? ? ?print "$Evalue\t"; >>>> ? ? ? ? ? ?print "$AlnLen\t"; >>>> ? ? ? ? ? ?print "$TotalLen\t"; >>>> ? ? ? ? ? ?print "$QueryStrand\t"; >>>> ? ? ? ? ? ?print "$HitStrand\n"; >>>> ? ? ? ?} >>>> ? ? ?} >>>> ? ? ?print "\n"; >>>> } >>>> >>>> >>>> This is a part of my code. >>>> >>>> i have blastn report as below: >>>> >>>> BLASTN 2.2.18 [Mar-02-2008] >>>> >>>> >>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>> Schaffer, >>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >>>> programs", ?Nucleic Acids Res. 25:3389-3402. >>>> >>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>> ? ? ? ? ? (24 letters) >>>> >>>> Database: hsa-mmu-rno_miRNA.fa >>>> ? ? ? ? ? ? 3524 sequences; 76,424 total letters >>>> >>>> Searching..................................................done >>>> >>>> >>>> >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Score >>>> E >>>> Sequences producing significant alignments: ? ? ? ? ? ? ? ? ? ? ?(bits) >>>> Value >>>> >>>> hsa-miR-548aa >>>> 48 ? 2e-009 >>>> hsa-miR-548d-5p >>>> 36 ? 9e-006 >>>> hsa-miR-548b-5p >>>> 36 ? 9e-006 >>>> hsa-miR-548z >>>> 34 ? 3e-005 >>>> hsa-miR-548q >>>> 30 ? 5e-004 >>>> hsa-miR-548n >>>> 30 ? 5e-004 >>>> hsa-miR-548ab >>>> 28 ? 0.002 >>>> hsa-miR-548v >>>> 28 ? 0.002 >>>> hsa-miR-548c-5p >>>> 28 ? 0.002 >>>> hsa-miR-548ag >>>> 26 ? 0.008 >>>> hsa-miR-548u >>>> 26 ? 0.008 >>>> hsa-miR-548c-3p >>>> 26 ? 0.008 >>>> hsa-miR-603 >>>> 26 ? 0.008 >>>> hsa-miR-548a-3p >>>> 26 ? 0.008 >>>> hsa-miR-548ac >>>> 24 ? 0.033 >>>> hsa-miR-548an >>>> 22 ? 0.13 >>>> hsa-miR-548aj >>>> 22 ? 0.13 >>>> hsa-miR-548i >>>> 22 ? 0.13 >>>> hsa-miR-548g >>>> 22 ? 0.13 >>>> hsa-miR-548j >>>> 22 ? 0.13 >>>> hsa-miR-548a-5p >>>> 22 ? 0.13 >>>> >>>>> hsa-miR-548aa >>>> >>>> ? ? ? ? ? ?Length = 25 >>>> >>>> ? Score = 48.1 bits (24), Expect = 2e-009 >>>> ? Identities = 24/24 (100%) >>>> ? Strand = Plus / Minus >>>> >>>> >>>> Query: 1 ?tggtgcaaaagtaattgtggtttt 24 >>>> ? ? ? ? ? ?|||||||||||||||||||||||| >>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>> >>>> >>>>> hsa-miR-548d-5p >>>> >>>> ? ? ? ? ? ?Length = 22 >>>> >>>> ? Score = 36.2 bits (18), Expect = 9e-006 >>>> ? Identities = 18/18 (100%) >>>> ? Strand = Plus / Plus >>>> >>>> >>>> Query: 7 ?aaaagtaattgtggtttt 24 >>>> ? ? ? ? ? ?|||||||||||||||||| >>>> Sbjct: 1 ?aaaagtaattgtggtttt 18 >>>> >>>> >>>> >>>> in this result i could not parse my code. i think my code does not >>>> accept the Query header that is >>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>>> output. >>>> >>>> kindly help me out. >>>> >>>> with regards, >>>> Pawan. >>>> >>>> >>>> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach >>>> wrote: >>>>> >>>>> Hi Pawan, >>>>> >>>>> Can you show your code? Is it basically following the structure shown >>>>> in >>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>>>> ? >>>>> >>>>> If that is the case >>>>> >>>>> $hsp->strand('query') >>>>> >>>>> >>>>> is exactly what you need. >>>>> To check if hit and query are on different strands you can do: >>>>> >>>>> if ( $hsp->strand('query') >>>>> * $hsp->strand('hit') == -1){ >>>>> >>>>> ? # do whatever you need to do if they are on opposite strands >>>>> >>>>> } >>>>> >>>>> Hope that helps >>>>> >>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: >>>>>> >>>>>> Hi, >>>>>> ? ? ? ? ? ? ? Using Bio::SearchIO module I am parsing the following >>>>>> Blast >>>>>> result. >>>>>> I have used the option- $hsp->strand('query'). >>>>>> >>>>>> But I cannot get detail of alignment. >>>>>> >>>>>> I need to know if my hit is forward (Strand = Plus / Plus) >>>>>> or reverse ( Strand = Plus / Minus)... >>>>>> ? Can anyone help me to get report as Plus or Minus for query ?or hit. >>>>>> >>>>>> thanks in advanced. >>>>>> >>>>>> With regards, >>>>>> Pawan >>>>>> >>>>>> >>>>>> >>>>>> BLASTN 2.2.18 [Dec-23-2011] >>>>>> >>>>>> >>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>> Schaffer, >>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>> search >>>>>> programs", ?Nucleic Acids Res. 25:3389-3402. >>>>>> >>>>>> Query= 000013_c10079-9984 >>>>>> ? ? ? ? ? (50 letters) >>>>>> >>>>>> Database: Cyano_Probe.fasta >>>>>> ? ? ? ? ? ? 4760 sequences; 238,000 total letters >>>>>> >>>>>> Searching..................................................done >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Score >>>>>> ? E >>>>>> Sequences producing significant alignments: >>>>>> ?(bits) >>>>>> Value >>>>>> >>>>>> 000013_c10079-9984 >>>>>> 100 ? 7e-024 >>>>>> 002619_2689273-2690037 >>>>>> 24 ? 0.36 >>>>>> 001126_c1123720-1123385 >>>>>> 24 ? 0.36 >>>>>> 003211_c3326737-3326480 >>>>>> 22 ? 1.4 >>>>>> 002415_2471082-2471420 >>>>>> 22 ? 1.4 >>>>>> 002269_2321276-2322463 >>>>>> 22 ? 1.4 >>>>>> 001328_c1326535-1326164 >>>>>> 22 ? 1.4 >>>>>> >>>>>>> 000013_c10079-9984 >>>>>> >>>>>> ? ? ? ? ? ?Length = 50 >>>>>> >>>>>> ? Score = 99.6 bits (50), Expect = 7e-024 >>>>>> ? Identities = 50/50 (100%) >>>>>> ? Strand = Plus / Plus >>>>>> >>>>>> >>>>>> Query: 1 ?agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>> ? ? ? ? ? ?|||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>> Sbjct: 1 ?agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> -- >>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>> Limited, >>>>> a charity registered in England with number 1021457 and a company >>>>> registered >>>>> in England with number 2742969, whose registered office is 215 Euston >>>>> Road, >>>>> London, NW1 2BE. >>> >>> >>> -- >>> ?The Wellcome Trust Sanger Institute is operated by Genome Research >>> ?Limited, a charity registered in England with number 1021457 and a >>> ?company registered in England with number 2742969, whose registered >>> ?office is 215 Euston Road, London, NW1 2BE. >>> > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. From fs5 at sanger.ac.uk Tue Jan 24 06:43:18 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 24 Jan 2012 11:43:18 +0000 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: References: <4F10A59E.5040807@sanger.ac.uk> <4F13EE93.2010502@sanger.ac.uk> <4F1443A8.6000007@sanger.ac.uk> Message-ID: <4F1E9956.9010700@sanger.ac.uk> I think that's an option that you can set when you ask for a new BLAST parser: my $searchio = new Bio::SearchIO( -format => 'blast', -file => 't/data/ecolitst.bls', -best_hit_only => 1, ); now when you use the same script that you have been using so far (loop over all hits), there will only be one hit per result. Frank On 24/01/12 11:31, kakchingtabam pawankumar sharma wrote: > Hi, > Thanks a lot for help Frank. It works for every Blast output. > One more question is that i want to best hit only(top hit of every query). > I show there is option called > $obj->best_hit_only; in Bio::SearchIO module. > So help to add this to my script. > I could not do. Its confusing. > > Thanks in Advanced. > > With best regards, > Pawan > > > On Mon, Jan 16, 2012 at 9:05 PM, Frank Schwach wrote: >> Excellent, well done! >> No, this is the way to do it. In BioPerl modules that use strand information >> you will find the values +1/-1 or undef. If you want to display those as >> PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy whatever, you have to convert it, >> but you know now how to do it. >> You have a syntax error in your code where you retrieve the query name: >> >> >> my $QueryName = $result->query_name(), my $QueryDescript = >> $result->query_description(); >> >> should be two lines and the comma should be a semicolon. >> >> Good luck! >> >> Frank >> >> >> >> >> >> On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote: >>> >>> So By using the if else conditon function, I have solve Frank. >>> I mean is there anyway in bioperl we can get directly using other >>> module! I hope u got it! >>> >>> >>> So my second Question have not replied that is >>> >>> i have blastn report as below: >>> >>> BLASTN 2.2.18 [Mar-02-2008] >>> >>> >>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, >>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >>> programs", Nucleic Acids Res. 25:3389-3402. >>> >>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>> (24 letters) >>> >>> Database: hsa-mmu-rno_miRNA.fa >>> 3524 sequences; 76,424 total letters >>> >>> Searching..................................................done >>> >>> >>> >>> Score E >>> Sequences producing significant alignments: (bits) >>> Value >>> >>> hsa-miR-548aa >>> 48 2e-009 >>> hsa-miR-548d-5p >>> 36 9e-006 >>> hsa-miR-548b-5p >>> 36 9e-006 >>> hsa-miR-548z >>> 34 3e-005 >>> hsa-miR-548q >>> 30 5e-004 >>> hsa-miR-548n >>> 30 5e-004 >>> hsa-miR-548ab >>> 28 0.002 >>> hsa-miR-548v >>> 28 0.002 >>> hsa-miR-548c-5p >>> 28 0.002 >>> hsa-miR-548ag >>> 26 0.008 >>> hsa-miR-548u >>> 26 0.008 >>> hsa-miR-548c-3p >>> 26 0.008 >>> hsa-miR-603 >>> 26 0.008 >>> hsa-miR-548a-3p >>> 26 0.008 >>> hsa-miR-548ac >>> 24 0.033 >>> hsa-miR-548an >>> 22 0.13 >>> hsa-miR-548aj >>> 22 0.13 >>> hsa-miR-548i >>> 22 0.13 >>> hsa-miR-548g >>> 22 0.13 >>> hsa-miR-548j >>> 22 0.13 >>> hsa-miR-548a-5p >>> 22 0.13 >>> >>>> hsa-miR-548aa >>> >>> Length = 25 >>> >>> Score = 48.1 bits (24), Expect = 2e-009 >>> Identities = 24/24 (100%) >>> Strand = Plus / Minus >>> >>> >>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>> |||||||||||||||||||||||| >>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>> >>> >>>> hsa-miR-548d-5p >>> >>> Length = 22 >>> >>> Score = 36.2 bits (18), Expect = 9e-006 >>> Identities = 18/18 (100%) >>> Strand = Plus / Plus >>> >>> >>> Query: 7 aaaagtaattgtggtttt 24 >>> |||||||||||||||||| >>> Sbjct: 1 aaaagtaattgtggtttt 18 >>> >>> >>> >>> in this result i could not parse my code. i think my code does not >>> accept the Query header that is >>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>> output. >>> >>> kindly help me out. >>> >>> with regards, >>> Pawan. >>> >>> On 1/16/12, Frank Schwach wrote: >>>> >>>> Hi Pawan , >>>> >>>> Please always "reply to all", so that you keep the discussion on the >>>> bioperl mailing list and more people can help you. >>>> What you need is a very basic Perl command. I could give you the code >>>> but I think you get more out of it if you experiment with it on your own >>>> because it is very fundamental. I'll point you in the right direction: >>>> you want an if-then-else conditional construct. >>>> >>>> Perl's documentation about this is here: >>>> http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs >>>> >>>> if strand is 1 you want to print "PLUS" else if it is -1 you want to >>>> print "MINUS", or else you might want to print "no strand" or something, >>>> or even treat it as an error and make the script abort. >>>> >>>> Give it a go and let us know if you need help. For basic (non-bio) Perl >>>> question, please also check out the community at >>>> http://www.perlmonks.org/. >>>> >>>> Hope that helps, >>>> >>>> Frank >>>> >>>> >>>> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote: >>>>> >>>>> Hi frank, >>>>> >>>>> Thanks for your kind reply. >>>>> I could get the vale for query as 1 value if it is plus. >>>>> and for hit = -1 if it is minus. >>>>> But i would like to print out as PLUS or MINUS not 1 or -1 my friend. >>>>> >>>>> you can see my code as below: >>>>> >>>>> while ( my $result = $searchio->next_result() ) { >>>>> my $QueryName = $result->query_name(), my $QueryDescript = >>>>> $result->query_description(); >>>>> my $QueryLength = $result->query_length; >>>>> my $NoHits = $result->num_hits; >>>>> >>>>> while( my $hit = $result->next_hit ) { >>>>> my $HitName = $hit->name(); >>>>> my $HitDescrip = $hit->description(); >>>>> my $HitLength = $hit->length; >>>>> my $Score = $hit->raw_score(); >>>>> my $Bits = $hit->bits; >>>>> >>>>> my $hsp = $hit->next_hsp; # Only check first (= best) hsp >>>>> my $Evalue = $hsp->evalue(); >>>>> my $AlnLen = $hsp->num_identical(); >>>>> my $TotalLen = $hsp->hsp_length; >>>>> my $QueryStrand = $hsp->strand('query'); >>>>> my $HitStrand = $hsp->strand('hit'); >>>>> >>>>> if($Evalue< $cutoff){ >>>>> print "$QueryName $QueryDescript\t"; >>>>> print "$QueryLength\t"; >>>>> print "$NoHits\t"; >>>>> print "$HitName $HitDescrip\t"; >>>>> print "$HitLength\t"; >>>>> print "$Score\t"; >>>>> print "$Bits\t"; >>>>> print "$Evalue\t"; >>>>> print "$AlnLen\t"; >>>>> print "$TotalLen\t"; >>>>> print "$QueryStrand\t"; >>>>> print "$HitStrand\n"; >>>>> } >>>>> } >>>>> print "\n"; >>>>> } >>>>> >>>>> >>>>> This is a part of my code. >>>>> >>>>> i have blastn report as below: >>>>> >>>>> BLASTN 2.2.18 [Mar-02-2008] >>>>> >>>>> >>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>> Schaffer, >>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>>> (24 letters) >>>>> >>>>> Database: hsa-mmu-rno_miRNA.fa >>>>> 3524 sequences; 76,424 total letters >>>>> >>>>> Searching..................................................done >>>>> >>>>> >>>>> >>>>> Score >>>>> E >>>>> Sequences producing significant alignments: (bits) >>>>> Value >>>>> >>>>> hsa-miR-548aa >>>>> 48 2e-009 >>>>> hsa-miR-548d-5p >>>>> 36 9e-006 >>>>> hsa-miR-548b-5p >>>>> 36 9e-006 >>>>> hsa-miR-548z >>>>> 34 3e-005 >>>>> hsa-miR-548q >>>>> 30 5e-004 >>>>> hsa-miR-548n >>>>> 30 5e-004 >>>>> hsa-miR-548ab >>>>> 28 0.002 >>>>> hsa-miR-548v >>>>> 28 0.002 >>>>> hsa-miR-548c-5p >>>>> 28 0.002 >>>>> hsa-miR-548ag >>>>> 26 0.008 >>>>> hsa-miR-548u >>>>> 26 0.008 >>>>> hsa-miR-548c-3p >>>>> 26 0.008 >>>>> hsa-miR-603 >>>>> 26 0.008 >>>>> hsa-miR-548a-3p >>>>> 26 0.008 >>>>> hsa-miR-548ac >>>>> 24 0.033 >>>>> hsa-miR-548an >>>>> 22 0.13 >>>>> hsa-miR-548aj >>>>> 22 0.13 >>>>> hsa-miR-548i >>>>> 22 0.13 >>>>> hsa-miR-548g >>>>> 22 0.13 >>>>> hsa-miR-548j >>>>> 22 0.13 >>>>> hsa-miR-548a-5p >>>>> 22 0.13 >>>>> >>>>>> hsa-miR-548aa >>>>> >>>>> Length = 25 >>>>> >>>>> Score = 48.1 bits (24), Expect = 2e-009 >>>>> Identities = 24/24 (100%) >>>>> Strand = Plus / Minus >>>>> >>>>> >>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>>>> |||||||||||||||||||||||| >>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>>> >>>>> >>>>>> hsa-miR-548d-5p >>>>> >>>>> Length = 22 >>>>> >>>>> Score = 36.2 bits (18), Expect = 9e-006 >>>>> Identities = 18/18 (100%) >>>>> Strand = Plus / Plus >>>>> >>>>> >>>>> Query: 7 aaaagtaattgtggtttt 24 >>>>> |||||||||||||||||| >>>>> Sbjct: 1 aaaagtaattgtggtttt 18 >>>>> >>>>> >>>>> >>>>> in this result i could not parse my code. i think my code does not >>>>> accept the Query header that is >>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>>>> output. >>>>> >>>>> kindly help me out. >>>>> >>>>> with regards, >>>>> Pawan. >>>>> >>>>> >>>>> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach >>>>> wrote: >>>>>> >>>>>> Hi Pawan, >>>>>> >>>>>> Can you show your code? Is it basically following the structure shown >>>>>> in >>>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>>>>> ? >>>>>> >>>>>> If that is the case >>>>>> >>>>>> $hsp->strand('query') >>>>>> >>>>>> >>>>>> is exactly what you need. >>>>>> To check if hit and query are on different strands you can do: >>>>>> >>>>>> if ( $hsp->strand('query') >>>>>> * $hsp->strand('hit') == -1){ >>>>>> >>>>>> # do whatever you need to do if they are on opposite strands >>>>>> >>>>>> } >>>>>> >>>>>> Hope that helps >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: >>>>>>> >>>>>>> Hi, >>>>>>> Using Bio::SearchIO module I am parsing the following >>>>>>> Blast >>>>>>> result. >>>>>>> I have used the option- $hsp->strand('query'). >>>>>>> >>>>>>> But I cannot get detail of alignment. >>>>>>> >>>>>>> I need to know if my hit is forward (Strand = Plus / Plus) >>>>>>> or reverse ( Strand = Plus / Minus)... >>>>>>> Can anyone help me to get report as Plus or Minus for query or hit. >>>>>>> >>>>>>> thanks in advanced. >>>>>>> >>>>>>> With regards, >>>>>>> Pawan >>>>>>> >>>>>>> >>>>>>> >>>>>>> BLASTN 2.2.18 [Dec-23-2011] >>>>>>> >>>>>>> >>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>> Schaffer, >>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>>> search >>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>> >>>>>>> Query= 000013_c10079-9984 >>>>>>> (50 letters) >>>>>>> >>>>>>> Database: Cyano_Probe.fasta >>>>>>> 4760 sequences; 238,000 total letters >>>>>>> >>>>>>> Searching..................................................done >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Score >>>>>>> E >>>>>>> Sequences producing significant alignments: >>>>>>> (bits) >>>>>>> Value >>>>>>> >>>>>>> 000013_c10079-9984 >>>>>>> 100 7e-024 >>>>>>> 002619_2689273-2690037 >>>>>>> 24 0.36 >>>>>>> 001126_c1123720-1123385 >>>>>>> 24 0.36 >>>>>>> 003211_c3326737-3326480 >>>>>>> 22 1.4 >>>>>>> 002415_2471082-2471420 >>>>>>> 22 1.4 >>>>>>> 002269_2321276-2322463 >>>>>>> 22 1.4 >>>>>>> 001328_c1326535-1326164 >>>>>>> 22 1.4 >>>>>>> >>>>>>>> 000013_c10079-9984 >>>>>>> >>>>>>> Length = 50 >>>>>>> >>>>>>> Score = 99.6 bits (50), Expect = 7e-024 >>>>>>> Identities = 50/50 (100%) >>>>>>> Strand = Plus / Plus >>>>>>> >>>>>>> >>>>>>> Query: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>>> Sbjct: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> -- >>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>> Limited, >>>>>> a charity registered in England with number 1021457 and a company >>>>>> registered >>>>>> in England with number 2742969, whose registered office is 215 Euston >>>>>> Road, >>>>>> London, NW1 2BE. >>>> >>>> >>>> -- >>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>> Limited, a charity registered in England with number 1021457 and a >>>> company registered in England with number 2742969, whose registered >>>> office is 215 Euston Road, London, NW1 2BE. >>>> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, >> a charity registered in England with number 1021457 and a company registered >> in England with number 2742969, whose registered office is 215 Euston Road, >> London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From joel.klein at wur.nl Tue Jan 24 08:38:28 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Tue, 24 Jan 2012 05:38:28 -0800 (PST) Subject: [Bioperl-l] New to bioperl Message-ID: <33194882.post@talk.nabble.com> Hi all, I'm somewhat new to the bioinformatics but I need to annotate a newly sequenced bacterial genome based on some known proteins amino acid sequences from a neighbouring organism. I've been doing this manually with tblastn and then search and annotated this in artemis. However I have an entire directory full of these protein sequences and was wondering if this could be automated in such way that I have an input nucleotide sequence consisting of contigs which are automatically translated in frames and then aligned and annotated with the known protein amino acid sequences. If you could help me with writing such a script I would be very grateful. Joel -- View this message in context: http://old.nabble.com/New-to-bioperl-tp33194882p33194882.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From roy.chaudhuri at gmail.com Tue Jan 24 09:11:22 2012 From: roy.chaudhuri at gmail.com (Roy Chaudhuri) Date: Tue, 24 Jan 2012 14:11:22 +0000 Subject: [Bioperl-l] New to bioperl In-Reply-To: <33194882.post@talk.nabble.com> References: <33194882.post@talk.nabble.com> Message-ID: <4F1EBC0A.10201@gmail.com> Hi Joel, This is possible using BioPerl, but it may be simpler to use an online automated annotation service eg: http://www.xbase.ac.uk/annotation Roy. On 24/01/2012 13:38, Bradyjoel wrote: > > Hi all, > > I'm somewhat new to the bioinformatics but I need to annotate a newly > sequenced bacterial genome based on some known proteins amino acid sequences > from a neighbouring organism. I've been doing this manually with tblastn and > then search and annotated this in artemis. However I have an entire > directory full of these protein sequences and was wondering if this could be > automated in such way that I have an input nucleotide sequence consisting of > contigs which are automatically translated in frames and then aligned and > annotated with the known protein amino acid sequences. If you could help me > with writing such a script I would be very grateful. > > Joel > > > From joel.klein at wur.nl Tue Jan 24 09:46:05 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Tue, 24 Jan 2012 06:46:05 -0800 (PST) Subject: [Bioperl-l] New to bioperl In-Reply-To: <4F1EBC0A.10201@gmail.com> References: <33194882.post@talk.nabble.com> <4F1EBC0A.10201@gmail.com> Message-ID: <33195240.post@talk.nabble.com> Hi Roy, Thank you for your quick reply, I already tried xbase and it finds some of the exopolysacharide biosynthesis genes that I?m looking for but only the genes that are already annotated in the other organisms. I also tried to merge these results but it still misses some of the genes or the correct annotation and merging also cost a lot of time. Since I am only looking for a certain set of genes, I thought it might be easier to use a certain script that can blast these these protein queries and add the annotation at the locations were it finds simularity in my sequence. I already tried to make a script myself but I'm still troubled how to connect the output of ablast to the action of adding the gene information and write it to a certain file. Joel Roy Chaudhuri-3 wrote: > > Hi Joel, > > This is possible using BioPerl, but it may be simpler to use an online > automated annotation service eg: > http://www.xbase.ac.uk/annotation > > Roy. > > On 24/01/2012 13:38, Bradyjoel wrote: >> >> Hi all, >> >> I'm somewhat new to the bioinformatics but I need to annotate a newly >> sequenced bacterial genome based on some known proteins amino acid >> sequences >> from a neighbouring organism. I've been doing this manually with tblastn >> and >> then search and annotated this in artemis. However I have an entire >> directory full of these protein sequences and was wondering if this could >> be >> automated in such way that I have an input nucleotide sequence consisting >> of >> contigs which are automatically translated in frames and then aligned and >> annotated with the known protein amino acid sequences. If you could help >> me >> with writing such a script I would be very grateful. >> >> Joel >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/New-to-bioperl-tp33194882p33195240.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From fs5 at sanger.ac.uk Tue Jan 24 10:46:04 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Tue, 24 Jan 2012 15:46:04 +0000 Subject: [Bioperl-l] New to bioperl In-Reply-To: <33195240.post@talk.nabble.com> References: <33194882.post@talk.nabble.com> <4F1EBC0A.10201@gmail.com> <33195240.post@talk.nabble.com> Message-ID: <4F1ED23C.6030506@sanger.ac.uk> Hi Joel, This is certainly possible with BioPerl and it would be a great way of learning Perl and BioPerl with a real application in mind, but you have to be aware of the fact that the initial learning curve will be steep and you will need to invest quite a bit of time to get going. If you want to do it, start on the BioPerl HOWTO pages, e.g. to see how to run BLAST from a script: http://www.bioperl.org/wiki/HOWTO:BlastPlus and how to read the BLAST report with a script: http://www.bioperl.org/wiki/HOWTO:SearchIO There are examples that you can run and use as a starting point for your own scripts. Once you have your annotation (e.g. you found a good hit to a region in the genome and want to annotate the region with the gene name and description from the hit), you could construct your genome with annotations using another (or even the same) BioPerl script, where you would build something called a Bio::Seq object, which is described here: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation which you can finally write to a flat text file in EMBL or GENBANK format or as a GFF file. You could even decide to turn that into a database and run a local copy of some genome browser off that. Plenty of options (this can often be one of the main problems..) The best way of getting help is to start scripting and when it fails, post your script along with a specific question here. If it is not a BioPerl but a general Perl question, you should also check out the really helpful community on perlmonks.org. Hope this helps. Good luck! Frank On 24/01/12 14:46, Bradyjoel wrote: > > Hi Roy, > > Thank you for your quick reply, I already tried xbase and it finds some of > the exopolysacharide biosynthesis genes that I?m looking for but only the > genes that are already annotated in the other organisms. I also tried to > merge these results but it still misses some of the genes or the correct > annotation and merging also cost a lot of time. Since I am only looking for > a certain set of genes, I thought it might be easier to use a certain script > that can blast these these protein queries and add the annotation at the > locations were it finds simularity in my sequence. I already tried to make a > script myself but I'm still troubled how to connect the output of ablast to > the action of adding the gene information and write it to a certain file. > > Joel > > > Roy Chaudhuri-3 wrote: >> >> Hi Joel, >> >> This is possible using BioPerl, but it may be simpler to use an online >> automated annotation service eg: >> http://www.xbase.ac.uk/annotation >> >> Roy. >> >> On 24/01/2012 13:38, Bradyjoel wrote: >>> >>> Hi all, >>> >>> I'm somewhat new to the bioinformatics but I need to annotate a newly >>> sequenced bacterial genome based on some known proteins amino acid >>> sequences >>> from a neighbouring organism. I've been doing this manually with tblastn >>> and >>> then search and annotated this in artemis. However I have an entire >>> directory full of these protein sequences and was wondering if this could >>> be >>> automated in such way that I have an input nucleotide sequence consisting >>> of >>> contigs which are automatically translated in frames and then aligned and >>> annotated with the known protein amino acid sequences. If you could help >>> me >>> with writing such a script I would be very grateful. >>> >>> Joel >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From pawan.mani2 at gmail.com Wed Jan 25 02:12:06 2012 From: pawan.mani2 at gmail.com (kakchingtabam pawankumar sharma) Date: Wed, 25 Jan 2012 12:42:06 +0530 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: <4F1E9956.9010700@sanger.ac.uk> References: <4F10A59E.5040807@sanger.ac.uk> <4F13EE93.2010502@sanger.ac.uk> <4F1443A8.6000007@sanger.ac.uk> <4F1E9956.9010700@sanger.ac.uk> Message-ID: Hi Frank, Thanks for your kind reply. I have done this in my script even though i could get the top hit still for every query. Still all hits are extracted by my script. So kindly help me to solve this problem. With best regards, Pawan On Tue, Jan 24, 2012 at 5:13 PM, Frank Schwach wrote: > I think that's an option that you can set when you ask for a new BLAST > parser: > my $searchio = new Bio::SearchIO( > ?-format => 'blast', > ?-file ? => 't/data/ecolitst.bls', > ?-best_hit_only => 1, > ); > > now when you use the same script that you have been using so far (loop over > all hits), there will only be one hit per result. > > Frank > > > > On 24/01/12 11:31, kakchingtabam pawankumar sharma wrote: >> >> Hi, >> ? ? ? Thanks a lot for help Frank. It works for every Blast output. >> One more question is that i want to best hit only(top hit of every query). >> I show there is option called >> $obj->best_hit_only; in Bio::SearchIO module. >> So help to add this to my script. >> I could not do. Its confusing. >> >> Thanks in Advanced. >> >> With best regards, >> Pawan >> >> >> On Mon, Jan 16, 2012 at 9:05 PM, Frank Schwach ?wrote: >>> >>> Excellent, well done! >>> No, this is the way to do it. In BioPerl modules that use strand >>> information >>> you will find the values +1/-1 or undef. If you want to display those as >>> PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy whatever, you have to convert >>> it, >>> but you know now how to do it. >>> You have a syntax error in your code where you retrieve the query name: >>> >>> >>> my $QueryName = $result->query_name(), my $QueryDescript = >>> $result->query_description(); >>> >>> should be two lines and the comma should be a semicolon. >>> >>> Good luck! >>> >>> Frank >>> >>> >>> >>> >>> >>> On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote: >>>> >>>> >>>> So By using the if else conditon function, I have solve Frank. >>>> I mean is there anyway in bioperl we can get directly using other >>>> module! I hope u got it! >>>> >>>> >>>> So my second Question have not replied that is >>>> >>>> i have blastn report as below: >>>> >>>> BLASTN 2.2.18 [Mar-02-2008] >>>> >>>> >>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>> Schaffer, >>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >>>> programs", ?Nucleic Acids Res. 25:3389-3402. >>>> >>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>> ? ? ? ? (24 letters) >>>> >>>> Database: hsa-mmu-rno_miRNA.fa >>>> ? ? ? ? ? 3524 sequences; 76,424 total letters >>>> >>>> Searching..................................................done >>>> >>>> >>>> >>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Score >>>> ?E >>>> Sequences producing significant alignments: ? ? ? ? ? ? ? ? ? ? ?(bits) >>>> Value >>>> >>>> hsa-miR-548aa >>>> 48 ? 2e-009 >>>> hsa-miR-548d-5p >>>> 36 ? 9e-006 >>>> hsa-miR-548b-5p >>>> 36 ? 9e-006 >>>> hsa-miR-548z >>>> 34 ? 3e-005 >>>> hsa-miR-548q >>>> 30 ? 5e-004 >>>> hsa-miR-548n >>>> 30 ? 5e-004 >>>> hsa-miR-548ab >>>> 28 ? 0.002 >>>> hsa-miR-548v >>>> 28 ? 0.002 >>>> hsa-miR-548c-5p >>>> 28 ? 0.002 >>>> hsa-miR-548ag >>>> 26 ? 0.008 >>>> hsa-miR-548u >>>> 26 ? 0.008 >>>> hsa-miR-548c-3p >>>> 26 ? 0.008 >>>> hsa-miR-603 >>>> 26 ? 0.008 >>>> hsa-miR-548a-3p >>>> 26 ? 0.008 >>>> hsa-miR-548ac >>>> 24 ? 0.033 >>>> hsa-miR-548an >>>> 22 ? 0.13 >>>> hsa-miR-548aj >>>> 22 ? 0.13 >>>> hsa-miR-548i >>>> 22 ? 0.13 >>>> hsa-miR-548g >>>> 22 ? 0.13 >>>> hsa-miR-548j >>>> 22 ? 0.13 >>>> hsa-miR-548a-5p >>>> 22 ? 0.13 >>>> >>>>> hsa-miR-548aa >>>> >>>> >>>> ? ? ? ? ?Length = 25 >>>> >>>> ?Score = 48.1 bits (24), Expect = 2e-009 >>>> ?Identities = 24/24 (100%) >>>> ?Strand = Plus / Minus >>>> >>>> >>>> Query: 1 ?tggtgcaaaagtaattgtggtttt 24 >>>> ? ? ? ? ?|||||||||||||||||||||||| >>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>> >>>> >>>>> hsa-miR-548d-5p >>>> >>>> >>>> ? ? ? ? ?Length = 22 >>>> >>>> ?Score = 36.2 bits (18), Expect = 9e-006 >>>> ?Identities = 18/18 (100%) >>>> ?Strand = Plus / Plus >>>> >>>> >>>> Query: 7 ?aaaagtaattgtggtttt 24 >>>> ? ? ? ? ?|||||||||||||||||| >>>> Sbjct: 1 ?aaaagtaattgtggtttt 18 >>>> >>>> >>>> >>>> in this result i could not parse my code. i think my code does not >>>> accept the Query header that is >>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>>> output. >>>> >>>> kindly help me out. >>>> >>>> with regards, >>>> Pawan. >>>> >>>> On 1/16/12, Frank Schwach ? ?wrote: >>>>> >>>>> >>>>> Hi Pawan , >>>>> >>>>> Please always "reply to all", so that you keep the discussion on the >>>>> bioperl mailing list and more people can help you. >>>>> What you need is a very basic Perl command. I could give you the code >>>>> but I think you get more out of it if you experiment with it on your >>>>> own >>>>> because it is very fundamental. I'll point you in the right direction: >>>>> you want an if-then-else conditional construct. >>>>> >>>>> Perl's documentation about this is here: >>>>> >>>>> http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs >>>>> >>>>> if strand is 1 you want to print "PLUS" else if it is -1 you want to >>>>> print "MINUS", or else you might want to print "no strand" or >>>>> something, >>>>> or even treat it as an error and make the script abort. >>>>> >>>>> Give it a go and let us know if you need help. For basic (non-bio) Perl >>>>> question, please also check out the community at >>>>> http://www.perlmonks.org/. >>>>> >>>>> Hope that helps, >>>>> >>>>> Frank >>>>> >>>>> >>>>> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote: >>>>>> >>>>>> >>>>>> Hi frank, >>>>>> >>>>>> Thanks for your kind reply. >>>>>> I could get the vale for query as 1 value if it is plus. >>>>>> and for hit = -1 if it is minus. >>>>>> But i would like to print out as PLUS or MINUS not 1 or -1 my friend. >>>>>> >>>>>> you can see my code as below: >>>>>> >>>>>> while ( my $result = $searchio->next_result() ) { >>>>>> ? ? ?my $QueryName = $result->query_name(), my $QueryDescript = >>>>>> $result->query_description(); >>>>>> ? ? ?my $QueryLength = $result->query_length; >>>>>> ? ? ?my $NoHits = $result->num_hits; >>>>>> >>>>>> ? ? ?while( my $hit = $result->next_hit ) { >>>>>> ? ? ? ? ?my $HitName = $hit->name(); >>>>>> ? ? ? ? ?my $HitDescrip = $hit->description(); >>>>>> ? ? ? ?my $HitLength = $hit->length; >>>>>> ? ? ? ? ?my $Score = $hit->raw_score(); >>>>>> ? ? ? ?my $Bits = $hit->bits; >>>>>> >>>>>> ? ? ? ? ?my $hsp = $hit->next_hsp; # Only check first (= best) hsp >>>>>> ? ? ? ?my $Evalue = ?$hsp->evalue(); >>>>>> ? ? ? ?my $AlnLen = $hsp->num_identical(); >>>>>> ? ? ? ?my $TotalLen = $hsp->hsp_length; >>>>>> ? ? ? ?my $QueryStrand = $hsp->strand('query'); >>>>>> ? ? ? ?my $HitStrand = $hsp->strand('hit'); >>>>>> >>>>>> ? ? ? ?if($Evalue< ? ? $cutoff){ >>>>>> ? ? ? ? ? ?print "$QueryName $QueryDescript\t"; >>>>>> ? ? ? ? ? ?print "$QueryLength\t"; >>>>>> ? ? ? ? ? ?print "$NoHits\t"; >>>>>> ? ? ? ? ? ?print "$HitName $HitDescrip\t"; >>>>>> ? ? ? ? ? ?print "$HitLength\t"; >>>>>> ? ? ? ? ? ?print "$Score\t"; >>>>>> ? ? ? ? ? ?print "$Bits\t"; >>>>>> ? ? ? ? ? ?print "$Evalue\t"; >>>>>> ? ? ? ? ? ?print "$AlnLen\t"; >>>>>> ? ? ? ? ? ?print "$TotalLen\t"; >>>>>> ? ? ? ? ? ?print "$QueryStrand\t"; >>>>>> ? ? ? ? ? ?print "$HitStrand\n"; >>>>>> ? ? ? ?} >>>>>> ? ? ?} >>>>>> ? ? ?print "\n"; >>>>>> } >>>>>> >>>>>> >>>>>> This is a part of my code. >>>>>> >>>>>> i have blastn report as below: >>>>>> >>>>>> BLASTN 2.2.18 [Mar-02-2008] >>>>>> >>>>>> >>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>> Schaffer, >>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>> search >>>>>> programs", ?Nucleic Acids Res. 25:3389-3402. >>>>>> >>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>>>> ? ? ? ? ? (24 letters) >>>>>> >>>>>> Database: hsa-mmu-rno_miRNA.fa >>>>>> ? ? ? ? ? ? 3524 sequences; 76,424 total letters >>>>>> >>>>>> Searching..................................................done >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Score >>>>>> E >>>>>> Sequences producing significant alignments: >>>>>> ?(bits) >>>>>> Value >>>>>> >>>>>> hsa-miR-548aa >>>>>> 48 ? 2e-009 >>>>>> hsa-miR-548d-5p >>>>>> 36 ? 9e-006 >>>>>> hsa-miR-548b-5p >>>>>> 36 ? 9e-006 >>>>>> hsa-miR-548z >>>>>> 34 ? 3e-005 >>>>>> hsa-miR-548q >>>>>> 30 ? 5e-004 >>>>>> hsa-miR-548n >>>>>> 30 ? 5e-004 >>>>>> hsa-miR-548ab >>>>>> 28 ? 0.002 >>>>>> hsa-miR-548v >>>>>> 28 ? 0.002 >>>>>> hsa-miR-548c-5p >>>>>> 28 ? 0.002 >>>>>> hsa-miR-548ag >>>>>> 26 ? 0.008 >>>>>> hsa-miR-548u >>>>>> 26 ? 0.008 >>>>>> hsa-miR-548c-3p >>>>>> 26 ? 0.008 >>>>>> hsa-miR-603 >>>>>> 26 ? 0.008 >>>>>> hsa-miR-548a-3p >>>>>> 26 ? 0.008 >>>>>> hsa-miR-548ac >>>>>> 24 ? 0.033 >>>>>> hsa-miR-548an >>>>>> 22 ? 0.13 >>>>>> hsa-miR-548aj >>>>>> 22 ? 0.13 >>>>>> hsa-miR-548i >>>>>> 22 ? 0.13 >>>>>> hsa-miR-548g >>>>>> 22 ? 0.13 >>>>>> hsa-miR-548j >>>>>> 22 ? 0.13 >>>>>> hsa-miR-548a-5p >>>>>> 22 ? 0.13 >>>>>> >>>>>>> hsa-miR-548aa >>>>>> >>>>>> >>>>>> ? ? ? ? ? ?Length = 25 >>>>>> >>>>>> ? Score = 48.1 bits (24), Expect = 2e-009 >>>>>> ? Identities = 24/24 (100%) >>>>>> ? Strand = Plus / Minus >>>>>> >>>>>> >>>>>> Query: 1 ?tggtgcaaaagtaattgtggtttt 24 >>>>>> ? ? ? ? ? ?|||||||||||||||||||||||| >>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>>>> >>>>>> >>>>>>> hsa-miR-548d-5p >>>>>> >>>>>> >>>>>> ? ? ? ? ? ?Length = 22 >>>>>> >>>>>> ? Score = 36.2 bits (18), Expect = 9e-006 >>>>>> ? Identities = 18/18 (100%) >>>>>> ? Strand = Plus / Plus >>>>>> >>>>>> >>>>>> Query: 7 ?aaaagtaattgtggtttt 24 >>>>>> ? ? ? ? ? ?|||||||||||||||||| >>>>>> Sbjct: 1 ?aaaagtaattgtggtttt 18 >>>>>> >>>>>> >>>>>> >>>>>> in this result i could not parse my code. i think my code does not >>>>>> accept the Query header that is >>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>>>>> output. >>>>>> >>>>>> kindly help me out. >>>>>> >>>>>> with regards, >>>>>> Pawan. >>>>>> >>>>>> >>>>>> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> Hi Pawan, >>>>>>> >>>>>>> Can you show your code? Is it basically following the structure shown >>>>>>> in >>>>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>>>>>> ? >>>>>>> >>>>>>> If that is the case >>>>>>> >>>>>>> $hsp->strand('query') >>>>>>> >>>>>>> >>>>>>> is exactly what you need. >>>>>>> To check if hit and query are on different strands you can do: >>>>>>> >>>>>>> if ( $hsp->strand('query') >>>>>>> * $hsp->strand('hit') == -1){ >>>>>>> >>>>>>> ? # do whatever you need to do if they are on opposite strands >>>>>>> >>>>>>> } >>>>>>> >>>>>>> Hope that helps >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> ? ? ? ? ? ? ? Using Bio::SearchIO module I am parsing the following >>>>>>>> Blast >>>>>>>> result. >>>>>>>> I have used the option- $hsp->strand('query'). >>>>>>>> >>>>>>>> But I cannot get detail of alignment. >>>>>>>> >>>>>>>> I need to know if my hit is forward (Strand = Plus / Plus) >>>>>>>> or reverse ( Strand = Plus / Minus)... >>>>>>>> ? Can anyone help me to get report as Plus or Minus for query ?or >>>>>>>> hit. >>>>>>>> >>>>>>>> thanks in advanced. >>>>>>>> >>>>>>>> With regards, >>>>>>>> Pawan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> BLASTN 2.2.18 [Dec-23-2011] >>>>>>>> >>>>>>>> >>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>>> Schaffer, >>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>>>> search >>>>>>>> programs", ?Nucleic Acids Res. 25:3389-3402. >>>>>>>> >>>>>>>> Query= 000013_c10079-9984 >>>>>>>> ? ? ? ? ? (50 letters) >>>>>>>> >>>>>>>> Database: Cyano_Probe.fasta >>>>>>>> ? ? ? ? ? ? 4760 sequences; 238,000 total letters >>>>>>>> >>>>>>>> Searching..................................................done >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Score >>>>>>>> ? E >>>>>>>> Sequences producing significant alignments: >>>>>>>> ?(bits) >>>>>>>> Value >>>>>>>> >>>>>>>> 000013_c10079-9984 >>>>>>>> 100 ? 7e-024 >>>>>>>> 002619_2689273-2690037 >>>>>>>> 24 ? 0.36 >>>>>>>> 001126_c1123720-1123385 >>>>>>>> 24 ? 0.36 >>>>>>>> 003211_c3326737-3326480 >>>>>>>> 22 ? 1.4 >>>>>>>> 002415_2471082-2471420 >>>>>>>> 22 ? 1.4 >>>>>>>> 002269_2321276-2322463 >>>>>>>> 22 ? 1.4 >>>>>>>> 001328_c1326535-1326164 >>>>>>>> 22 ? 1.4 >>>>>>>> >>>>>>>>> 000013_c10079-9984 >>>>>>>> >>>>>>>> >>>>>>>> ? ? ? ? ? ?Length = 50 >>>>>>>> >>>>>>>> ? Score = 99.6 bits (50), Expect = 7e-024 >>>>>>>> ? Identities = 50/50 (100%) >>>>>>>> ? Strand = Plus / Plus >>>>>>>> >>>>>>>> >>>>>>>> Query: 1 ?agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>>> ? ? ? ? ? ?|||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>>>> Sbjct: 1 ?agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>> Limited, >>>>>>> a charity registered in England with number 1021457 and a company >>>>>>> registered >>>>>>> in England with number 2742969, whose registered office is 215 Euston >>>>>>> Road, >>>>>>> London, NW1 2BE. >>>>> >>>>> >>>>> >>>>> -- >>>>> ?The Wellcome Trust Sanger Institute is operated by Genome Research >>>>> ?Limited, a charity registered in England with number 1021457 and a >>>>> ?company registered in England with number 2742969, whose registered >>>>> ?office is 215 Euston Road, London, NW1 2BE. >>>>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, >>> a charity registered in England with number 1021457 and a company >>> registered >>> in England with number 2742969, whose registered office is 215 Euston >>> Road, >>> London, NW1 2BE. > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research Limited, > a charity registered in England with number 1021457 and a company registered > in England with number 2742969, whose registered office is 215 Euston Road, > London, NW1 2BE. From fs5 at sanger.ac.uk Wed Jan 25 04:37:51 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 25 Jan 2012 09:37:51 +0000 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: References: <4F10A59E.5040807@sanger.ac.uk> <4F13EE93.2010502@sanger.ac.uk> <4F1443A8.6000007@sanger.ac.uk> <4F1E9956.9010700@sanger.ac.uk> Message-ID: <4F1FCD6F.70204@sanger.ac.uk> can you post your script please? On 25/01/12 07:12, kakchingtabam pawankumar sharma wrote: > Hi Frank, > Thanks for your kind reply. I have done this in my script even though > i could get the top hit still for every query. Still all hits are > extracted by my script. So kindly help me to solve this problem. > > With best regards, > Pawan > > On Tue, Jan 24, 2012 at 5:13 PM, Frank Schwach wrote: >> I think that's an option that you can set when you ask for a new BLAST >> parser: >> my $searchio = new Bio::SearchIO( >> -format => 'blast', >> -file => 't/data/ecolitst.bls', >> -best_hit_only => 1, >> ); >> >> now when you use the same script that you have been using so far (loop over >> all hits), there will only be one hit per result. >> >> Frank >> >> >> >> On 24/01/12 11:31, kakchingtabam pawankumar sharma wrote: >>> >>> Hi, >>> Thanks a lot for help Frank. It works for every Blast output. >>> One more question is that i want to best hit only(top hit of every query). >>> I show there is option called >>> $obj->best_hit_only; in Bio::SearchIO module. >>> So help to add this to my script. >>> I could not do. Its confusing. >>> >>> Thanks in Advanced. >>> >>> With best regards, >>> Pawan >>> >>> >>> On Mon, Jan 16, 2012 at 9:05 PM, Frank Schwach wrote: >>>> >>>> Excellent, well done! >>>> No, this is the way to do it. In BioPerl modules that use strand >>>> information >>>> you will find the values +1/-1 or undef. If you want to display those as >>>> PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy whatever, you have to convert >>>> it, >>>> but you know now how to do it. >>>> You have a syntax error in your code where you retrieve the query name: >>>> >>>> >>>> my $QueryName = $result->query_name(), my $QueryDescript = >>>> $result->query_description(); >>>> >>>> should be two lines and the comma should be a semicolon. >>>> >>>> Good luck! >>>> >>>> Frank >>>> >>>> >>>> >>>> >>>> >>>> On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote: >>>>> >>>>> >>>>> So By using the if else conditon function, I have solve Frank. >>>>> I mean is there anyway in bioperl we can get directly using other >>>>> module! I hope u got it! >>>>> >>>>> >>>>> So my second Question have not replied that is >>>>> >>>>> i have blastn report as below: >>>>> >>>>> BLASTN 2.2.18 [Mar-02-2008] >>>>> >>>>> >>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>> Schaffer, >>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database search >>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>>> (24 letters) >>>>> >>>>> Database: hsa-mmu-rno_miRNA.fa >>>>> 3524 sequences; 76,424 total letters >>>>> >>>>> Searching..................................................done >>>>> >>>>> >>>>> >>>>> Score >>>>> E >>>>> Sequences producing significant alignments: (bits) >>>>> Value >>>>> >>>>> hsa-miR-548aa >>>>> 48 2e-009 >>>>> hsa-miR-548d-5p >>>>> 36 9e-006 >>>>> hsa-miR-548b-5p >>>>> 36 9e-006 >>>>> hsa-miR-548z >>>>> 34 3e-005 >>>>> hsa-miR-548q >>>>> 30 5e-004 >>>>> hsa-miR-548n >>>>> 30 5e-004 >>>>> hsa-miR-548ab >>>>> 28 0.002 >>>>> hsa-miR-548v >>>>> 28 0.002 >>>>> hsa-miR-548c-5p >>>>> 28 0.002 >>>>> hsa-miR-548ag >>>>> 26 0.008 >>>>> hsa-miR-548u >>>>> 26 0.008 >>>>> hsa-miR-548c-3p >>>>> 26 0.008 >>>>> hsa-miR-603 >>>>> 26 0.008 >>>>> hsa-miR-548a-3p >>>>> 26 0.008 >>>>> hsa-miR-548ac >>>>> 24 0.033 >>>>> hsa-miR-548an >>>>> 22 0.13 >>>>> hsa-miR-548aj >>>>> 22 0.13 >>>>> hsa-miR-548i >>>>> 22 0.13 >>>>> hsa-miR-548g >>>>> 22 0.13 >>>>> hsa-miR-548j >>>>> 22 0.13 >>>>> hsa-miR-548a-5p >>>>> 22 0.13 >>>>> >>>>>> hsa-miR-548aa >>>>> >>>>> >>>>> Length = 25 >>>>> >>>>> Score = 48.1 bits (24), Expect = 2e-009 >>>>> Identities = 24/24 (100%) >>>>> Strand = Plus / Minus >>>>> >>>>> >>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>>>> |||||||||||||||||||||||| >>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>>> >>>>> >>>>>> hsa-miR-548d-5p >>>>> >>>>> >>>>> Length = 22 >>>>> >>>>> Score = 36.2 bits (18), Expect = 9e-006 >>>>> Identities = 18/18 (100%) >>>>> Strand = Plus / Plus >>>>> >>>>> >>>>> Query: 7 aaaagtaattgtggtttt 24 >>>>> |||||||||||||||||| >>>>> Sbjct: 1 aaaagtaattgtggtttt 18 >>>>> >>>>> >>>>> >>>>> in this result i could not parse my code. i think my code does not >>>>> accept the Query header that is >>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>>>> output. >>>>> >>>>> kindly help me out. >>>>> >>>>> with regards, >>>>> Pawan. >>>>> >>>>> On 1/16/12, Frank Schwach wrote: >>>>>> >>>>>> >>>>>> Hi Pawan , >>>>>> >>>>>> Please always "reply to all", so that you keep the discussion on the >>>>>> bioperl mailing list and more people can help you. >>>>>> What you need is a very basic Perl command. I could give you the code >>>>>> but I think you get more out of it if you experiment with it on your >>>>>> own >>>>>> because it is very fundamental. I'll point you in the right direction: >>>>>> you want an if-then-else conditional construct. >>>>>> >>>>>> Perl's documentation about this is here: >>>>>> >>>>>> http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs >>>>>> >>>>>> if strand is 1 you want to print "PLUS" else if it is -1 you want to >>>>>> print "MINUS", or else you might want to print "no strand" or >>>>>> something, >>>>>> or even treat it as an error and make the script abort. >>>>>> >>>>>> Give it a go and let us know if you need help. For basic (non-bio) Perl >>>>>> question, please also check out the community at >>>>>> http://www.perlmonks.org/. >>>>>> >>>>>> Hope that helps, >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote: >>>>>>> >>>>>>> >>>>>>> Hi frank, >>>>>>> >>>>>>> Thanks for your kind reply. >>>>>>> I could get the vale for query as 1 value if it is plus. >>>>>>> and for hit = -1 if it is minus. >>>>>>> But i would like to print out as PLUS or MINUS not 1 or -1 my friend. >>>>>>> >>>>>>> you can see my code as below: >>>>>>> >>>>>>> while ( my $result = $searchio->next_result() ) { >>>>>>> my $QueryName = $result->query_name(), my $QueryDescript = >>>>>>> $result->query_description(); >>>>>>> my $QueryLength = $result->query_length; >>>>>>> my $NoHits = $result->num_hits; >>>>>>> >>>>>>> while( my $hit = $result->next_hit ) { >>>>>>> my $HitName = $hit->name(); >>>>>>> my $HitDescrip = $hit->description(); >>>>>>> my $HitLength = $hit->length; >>>>>>> my $Score = $hit->raw_score(); >>>>>>> my $Bits = $hit->bits; >>>>>>> >>>>>>> my $hsp = $hit->next_hsp; # Only check first (= best) hsp >>>>>>> my $Evalue = $hsp->evalue(); >>>>>>> my $AlnLen = $hsp->num_identical(); >>>>>>> my $TotalLen = $hsp->hsp_length; >>>>>>> my $QueryStrand = $hsp->strand('query'); >>>>>>> my $HitStrand = $hsp->strand('hit'); >>>>>>> >>>>>>> if($Evalue< $cutoff){ >>>>>>> print "$QueryName $QueryDescript\t"; >>>>>>> print "$QueryLength\t"; >>>>>>> print "$NoHits\t"; >>>>>>> print "$HitName $HitDescrip\t"; >>>>>>> print "$HitLength\t"; >>>>>>> print "$Score\t"; >>>>>>> print "$Bits\t"; >>>>>>> print "$Evalue\t"; >>>>>>> print "$AlnLen\t"; >>>>>>> print "$TotalLen\t"; >>>>>>> print "$QueryStrand\t"; >>>>>>> print "$HitStrand\n"; >>>>>>> } >>>>>>> } >>>>>>> print "\n"; >>>>>>> } >>>>>>> >>>>>>> >>>>>>> This is a part of my code. >>>>>>> >>>>>>> i have blastn report as below: >>>>>>> >>>>>>> BLASTN 2.2.18 [Mar-02-2008] >>>>>>> >>>>>>> >>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>> Schaffer, >>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>>> search >>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>> >>>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>>>>> (24 letters) >>>>>>> >>>>>>> Database: hsa-mmu-rno_miRNA.fa >>>>>>> 3524 sequences; 76,424 total letters >>>>>>> >>>>>>> Searching..................................................done >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Score >>>>>>> E >>>>>>> Sequences producing significant alignments: >>>>>>> (bits) >>>>>>> Value >>>>>>> >>>>>>> hsa-miR-548aa >>>>>>> 48 2e-009 >>>>>>> hsa-miR-548d-5p >>>>>>> 36 9e-006 >>>>>>> hsa-miR-548b-5p >>>>>>> 36 9e-006 >>>>>>> hsa-miR-548z >>>>>>> 34 3e-005 >>>>>>> hsa-miR-548q >>>>>>> 30 5e-004 >>>>>>> hsa-miR-548n >>>>>>> 30 5e-004 >>>>>>> hsa-miR-548ab >>>>>>> 28 0.002 >>>>>>> hsa-miR-548v >>>>>>> 28 0.002 >>>>>>> hsa-miR-548c-5p >>>>>>> 28 0.002 >>>>>>> hsa-miR-548ag >>>>>>> 26 0.008 >>>>>>> hsa-miR-548u >>>>>>> 26 0.008 >>>>>>> hsa-miR-548c-3p >>>>>>> 26 0.008 >>>>>>> hsa-miR-603 >>>>>>> 26 0.008 >>>>>>> hsa-miR-548a-3p >>>>>>> 26 0.008 >>>>>>> hsa-miR-548ac >>>>>>> 24 0.033 >>>>>>> hsa-miR-548an >>>>>>> 22 0.13 >>>>>>> hsa-miR-548aj >>>>>>> 22 0.13 >>>>>>> hsa-miR-548i >>>>>>> 22 0.13 >>>>>>> hsa-miR-548g >>>>>>> 22 0.13 >>>>>>> hsa-miR-548j >>>>>>> 22 0.13 >>>>>>> hsa-miR-548a-5p >>>>>>> 22 0.13 >>>>>>> >>>>>>>> hsa-miR-548aa >>>>>>> >>>>>>> >>>>>>> Length = 25 >>>>>>> >>>>>>> Score = 48.1 bits (24), Expect = 2e-009 >>>>>>> Identities = 24/24 (100%) >>>>>>> Strand = Plus / Minus >>>>>>> >>>>>>> >>>>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>>>>>> |||||||||||||||||||||||| >>>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>>>>> >>>>>>> >>>>>>>> hsa-miR-548d-5p >>>>>>> >>>>>>> >>>>>>> Length = 22 >>>>>>> >>>>>>> Score = 36.2 bits (18), Expect = 9e-006 >>>>>>> Identities = 18/18 (100%) >>>>>>> Strand = Plus / Plus >>>>>>> >>>>>>> >>>>>>> Query: 7 aaaagtaattgtggtttt 24 >>>>>>> |||||||||||||||||| >>>>>>> Sbjct: 1 aaaagtaattgtggtttt 18 >>>>>>> >>>>>>> >>>>>>> >>>>>>> in this result i could not parse my code. i think my code does not >>>>>>> accept the Query header that is >>>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>>>>>> output. >>>>>>> >>>>>>> kindly help me out. >>>>>>> >>>>>>> with regards, >>>>>>> Pawan. >>>>>>> >>>>>>> >>>>>>> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Pawan, >>>>>>>> >>>>>>>> Can you show your code? Is it basically following the structure shown >>>>>>>> in >>>>>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>>>>>>> ? >>>>>>>> >>>>>>>> If that is the case >>>>>>>> >>>>>>>> $hsp->strand('query') >>>>>>>> >>>>>>>> >>>>>>>> is exactly what you need. >>>>>>>> To check if hit and query are on different strands you can do: >>>>>>>> >>>>>>>> if ( $hsp->strand('query') >>>>>>>> * $hsp->strand('hit') == -1){ >>>>>>>> >>>>>>>> # do whatever you need to do if they are on opposite strands >>>>>>>> >>>>>>>> } >>>>>>>> >>>>>>>> Hope that helps >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> Using Bio::SearchIO module I am parsing the following >>>>>>>>> Blast >>>>>>>>> result. >>>>>>>>> I have used the option- $hsp->strand('query'). >>>>>>>>> >>>>>>>>> But I cannot get detail of alignment. >>>>>>>>> >>>>>>>>> I need to know if my hit is forward (Strand = Plus / Plus) >>>>>>>>> or reverse ( Strand = Plus / Minus)... >>>>>>>>> Can anyone help me to get report as Plus or Minus for query or >>>>>>>>> hit. >>>>>>>>> >>>>>>>>> thanks in advanced. >>>>>>>>> >>>>>>>>> With regards, >>>>>>>>> Pawan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> BLASTN 2.2.18 [Dec-23-2011] >>>>>>>>> >>>>>>>>> >>>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>>>> Schaffer, >>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>>>>> search >>>>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>>>> >>>>>>>>> Query= 000013_c10079-9984 >>>>>>>>> (50 letters) >>>>>>>>> >>>>>>>>> Database: Cyano_Probe.fasta >>>>>>>>> 4760 sequences; 238,000 total letters >>>>>>>>> >>>>>>>>> Searching..................................................done >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Score >>>>>>>>> E >>>>>>>>> Sequences producing significant alignments: >>>>>>>>> (bits) >>>>>>>>> Value >>>>>>>>> >>>>>>>>> 000013_c10079-9984 >>>>>>>>> 100 7e-024 >>>>>>>>> 002619_2689273-2690037 >>>>>>>>> 24 0.36 >>>>>>>>> 001126_c1123720-1123385 >>>>>>>>> 24 0.36 >>>>>>>>> 003211_c3326737-3326480 >>>>>>>>> 22 1.4 >>>>>>>>> 002415_2471082-2471420 >>>>>>>>> 22 1.4 >>>>>>>>> 002269_2321276-2322463 >>>>>>>>> 22 1.4 >>>>>>>>> 001328_c1326535-1326164 >>>>>>>>> 22 1.4 >>>>>>>>> >>>>>>>>>> 000013_c10079-9984 >>>>>>>>> >>>>>>>>> >>>>>>>>> Length = 50 >>>>>>>>> >>>>>>>>> Score = 99.6 bits (50), Expect = 7e-024 >>>>>>>>> Identities = 50/50 (100%) >>>>>>>>> Strand = Plus / Plus >>>>>>>>> >>>>>>>>> >>>>>>>>> Query: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>>>>> Sbjct: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>>> Limited, >>>>>>>> a charity registered in England with number 1021457 and a company >>>>>>>> registered >>>>>>>> in England with number 2742969, whose registered office is 215 Euston >>>>>>>> Road, >>>>>>>> London, NW1 2BE. >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>> Limited, a charity registered in England with number 1021457 and a >>>>>> company registered in England with number 2742969, whose registered >>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>> >>>> >>>> >>>> -- >>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>> Limited, >>>> a charity registered in England with number 1021457 and a company >>>> registered >>>> in England with number 2742969, whose registered office is 215 Euston >>>> Road, >>>> London, NW1 2BE. >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research Limited, >> a charity registered in England with number 1021457 and a company registered >> in England with number 2742969, whose registered office is 215 Euston Road, >> London, NW1 2BE. -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From pawan.mani2 at gmail.com Wed Jan 25 09:41:39 2012 From: pawan.mani2 at gmail.com (kakchingtabam pawankumar sharma) Date: Wed, 25 Jan 2012 20:11:39 +0530 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: <4F1FCD6F.70204@sanger.ac.uk> References: <4F10A59E.5040807@sanger.ac.uk> <4F13EE93.2010502@sanger.ac.uk> <4F1443A8.6000007@sanger.ac.uk> <4F1E9956.9010700@sanger.ac.uk> <4F1FCD6F.70204@sanger.ac.uk> Message-ID: Hi, This is my script:- my $searchio = new Bio::SearchIO( -format => 'blast', -file => $blast_report, -best_hit_only => 1); while ( my $result = $searchio->next_result() ) { my $QueryName = $result->query_name(); my $QueryDescript = $result->query_description(); my $QueryLength = $result->query_length; my $NoHits = $result->num_hits; while( my $hit = $result->next_hit ) { my $HitName = $hit->name(); my $HitDescrip = $hit->description(); my $HitLength = $hit->length; my $Score = $hit->raw_score(); my $Bits = $hit->bits; my $hsp = $hit->next_hsp; # Only check first (= best) hsp my $Evalue = $hsp->evalue(); my $AlnLen = $hsp->num_identical(); my $TotalLen = $hsp->hsp_length; my $QueryStrand = $hsp->strand('query'); my $HitStrand = $hsp->strand('hit'); #if($Evalue < $cutoff){ print "$QueryName $QueryDescript\t"; print "$QueryLength\t"; print "$NoHits\t"; print "$HitName $HitDescrip\t"; print "$HitLength\t"; print "$Score\t"; print "$Bits\t"; print "$Evalue\n"; print "$AlnLen\t"; print "$TotalLen\t"; print "$QueryStrand\t"; print "$HitStrand\n"; #} } #print "\n"; } So Can You Predict where I need to modify This Script To get only tophit of every Query. With Regards, Pawan On 1/25/12, Frank Schwach wrote: > can you post your script please? > > On 25/01/12 07:12, kakchingtabam pawankumar sharma wrote: >> Hi Frank, >> Thanks for your kind reply. I have done this in my script even though >> i could get the top hit still for every query. Still all hits are >> extracted by my script. So kindly help me to solve this problem. >> >> With best regards, >> Pawan >> >> On Tue, Jan 24, 2012 at 5:13 PM, Frank Schwach wrote: >>> I think that's an option that you can set when you ask for a new BLAST >>> parser: >>> my $searchio = new Bio::SearchIO( >>> -format => 'blast', >>> -file => 't/data/ecolitst.bls', >>> -best_hit_only => 1, >>> ); >>> >>> now when you use the same script that you have been using so far (loop >>> over >>> all hits), there will only be one hit per result. >>> >>> Frank >>> >>> >>> >>> On 24/01/12 11:31, kakchingtabam pawankumar sharma wrote: >>>> >>>> Hi, >>>> Thanks a lot for help Frank. It works for every Blast output. >>>> One more question is that i want to best hit only(top hit of every >>>> query). >>>> I show there is option called >>>> $obj->best_hit_only; in Bio::SearchIO module. >>>> So help to add this to my script. >>>> I could not do. Its confusing. >>>> >>>> Thanks in Advanced. >>>> >>>> With best regards, >>>> Pawan >>>> >>>> >>>> On Mon, Jan 16, 2012 at 9:05 PM, Frank Schwach >>>> wrote: >>>>> >>>>> Excellent, well done! >>>>> No, this is the way to do it. In BioPerl modules that use strand >>>>> information >>>>> you will find the values +1/-1 or undef. If you want to display those >>>>> as >>>>> PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy whatever, you have to convert >>>>> it, >>>>> but you know now how to do it. >>>>> You have a syntax error in your code where you retrieve the query name: >>>>> >>>>> >>>>> my $QueryName = $result->query_name(), my $QueryDescript = >>>>> $result->query_description(); >>>>> >>>>> should be two lines and the comma should be a semicolon. >>>>> >>>>> Good luck! >>>>> >>>>> Frank >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote: >>>>>> >>>>>> >>>>>> So By using the if else conditon function, I have solve Frank. >>>>>> I mean is there anyway in bioperl we can get directly using other >>>>>> module! I hope u got it! >>>>>> >>>>>> >>>>>> So my second Question have not replied that is >>>>>> >>>>>> i have blastn report as below: >>>>>> >>>>>> BLASTN 2.2.18 [Mar-02-2008] >>>>>> >>>>>> >>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>> Schaffer, >>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>> search >>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>> >>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>>>> (24 letters) >>>>>> >>>>>> Database: hsa-mmu-rno_miRNA.fa >>>>>> 3524 sequences; 76,424 total letters >>>>>> >>>>>> Searching..................................................done >>>>>> >>>>>> >>>>>> >>>>>> Score >>>>>> E >>>>>> Sequences producing significant alignments: >>>>>> (bits) >>>>>> Value >>>>>> >>>>>> hsa-miR-548aa >>>>>> 48 2e-009 >>>>>> hsa-miR-548d-5p >>>>>> 36 9e-006 >>>>>> hsa-miR-548b-5p >>>>>> 36 9e-006 >>>>>> hsa-miR-548z >>>>>> 34 3e-005 >>>>>> hsa-miR-548q >>>>>> 30 5e-004 >>>>>> hsa-miR-548n >>>>>> 30 5e-004 >>>>>> hsa-miR-548ab >>>>>> 28 0.002 >>>>>> hsa-miR-548v >>>>>> 28 0.002 >>>>>> hsa-miR-548c-5p >>>>>> 28 0.002 >>>>>> hsa-miR-548ag >>>>>> 26 0.008 >>>>>> hsa-miR-548u >>>>>> 26 0.008 >>>>>> hsa-miR-548c-3p >>>>>> 26 0.008 >>>>>> hsa-miR-603 >>>>>> 26 0.008 >>>>>> hsa-miR-548a-3p >>>>>> 26 0.008 >>>>>> hsa-miR-548ac >>>>>> 24 0.033 >>>>>> hsa-miR-548an >>>>>> 22 0.13 >>>>>> hsa-miR-548aj >>>>>> 22 0.13 >>>>>> hsa-miR-548i >>>>>> 22 0.13 >>>>>> hsa-miR-548g >>>>>> 22 0.13 >>>>>> hsa-miR-548j >>>>>> 22 0.13 >>>>>> hsa-miR-548a-5p >>>>>> 22 0.13 >>>>>> >>>>>>> hsa-miR-548aa >>>>>> >>>>>> >>>>>> Length = 25 >>>>>> >>>>>> Score = 48.1 bits (24), Expect = 2e-009 >>>>>> Identities = 24/24 (100%) >>>>>> Strand = Plus / Minus >>>>>> >>>>>> >>>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>>>>> |||||||||||||||||||||||| >>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>>>> >>>>>> >>>>>>> hsa-miR-548d-5p >>>>>> >>>>>> >>>>>> Length = 22 >>>>>> >>>>>> Score = 36.2 bits (18), Expect = 9e-006 >>>>>> Identities = 18/18 (100%) >>>>>> Strand = Plus / Plus >>>>>> >>>>>> >>>>>> Query: 7 aaaagtaattgtggtttt 24 >>>>>> |||||||||||||||||| >>>>>> Sbjct: 1 aaaagtaattgtggtttt 18 >>>>>> >>>>>> >>>>>> >>>>>> in this result i could not parse my code. i think my code does not >>>>>> accept the Query header that is >>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>>>>> output. >>>>>> >>>>>> kindly help me out. >>>>>> >>>>>> with regards, >>>>>> Pawan. >>>>>> >>>>>> On 1/16/12, Frank Schwach wrote: >>>>>>> >>>>>>> >>>>>>> Hi Pawan , >>>>>>> >>>>>>> Please always "reply to all", so that you keep the discussion on the >>>>>>> bioperl mailing list and more people can help you. >>>>>>> What you need is a very basic Perl command. I could give you the code >>>>>>> but I think you get more out of it if you experiment with it on your >>>>>>> own >>>>>>> because it is very fundamental. I'll point you in the right >>>>>>> direction: >>>>>>> you want an if-then-else conditional construct. >>>>>>> >>>>>>> Perl's documentation about this is here: >>>>>>> >>>>>>> http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs >>>>>>> >>>>>>> if strand is 1 you want to print "PLUS" else if it is -1 you want to >>>>>>> print "MINUS", or else you might want to print "no strand" or >>>>>>> something, >>>>>>> or even treat it as an error and make the script abort. >>>>>>> >>>>>>> Give it a go and let us know if you need help. For basic (non-bio) >>>>>>> Perl >>>>>>> question, please also check out the community at >>>>>>> http://www.perlmonks.org/. >>>>>>> >>>>>>> Hope that helps, >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi frank, >>>>>>>> >>>>>>>> Thanks for your kind reply. >>>>>>>> I could get the vale for query as 1 value if it is plus. >>>>>>>> and for hit = -1 if it is minus. >>>>>>>> But i would like to print out as PLUS or MINUS not 1 or -1 my >>>>>>>> friend. >>>>>>>> >>>>>>>> you can see my code as below: >>>>>>>> >>>>>>>> while ( my $result = $searchio->next_result() ) { >>>>>>>> my $QueryName = $result->query_name(), my $QueryDescript = >>>>>>>> $result->query_description(); >>>>>>>> my $QueryLength = $result->query_length; >>>>>>>> my $NoHits = $result->num_hits; >>>>>>>> >>>>>>>> while( my $hit = $result->next_hit ) { >>>>>>>> my $HitName = $hit->name(); >>>>>>>> my $HitDescrip = $hit->description(); >>>>>>>> my $HitLength = $hit->length; >>>>>>>> my $Score = $hit->raw_score(); >>>>>>>> my $Bits = $hit->bits; >>>>>>>> >>>>>>>> my $hsp = $hit->next_hsp; # Only check first (= best) hsp >>>>>>>> my $Evalue = $hsp->evalue(); >>>>>>>> my $AlnLen = $hsp->num_identical(); >>>>>>>> my $TotalLen = $hsp->hsp_length; >>>>>>>> my $QueryStrand = $hsp->strand('query'); >>>>>>>> my $HitStrand = $hsp->strand('hit'); >>>>>>>> >>>>>>>> if($Evalue< $cutoff){ >>>>>>>> print "$QueryName $QueryDescript\t"; >>>>>>>> print "$QueryLength\t"; >>>>>>>> print "$NoHits\t"; >>>>>>>> print "$HitName $HitDescrip\t"; >>>>>>>> print "$HitLength\t"; >>>>>>>> print "$Score\t"; >>>>>>>> print "$Bits\t"; >>>>>>>> print "$Evalue\t"; >>>>>>>> print "$AlnLen\t"; >>>>>>>> print "$TotalLen\t"; >>>>>>>> print "$QueryStrand\t"; >>>>>>>> print "$HitStrand\n"; >>>>>>>> } >>>>>>>> } >>>>>>>> print "\n"; >>>>>>>> } >>>>>>>> >>>>>>>> >>>>>>>> This is a part of my code. >>>>>>>> >>>>>>>> i have blastn report as below: >>>>>>>> >>>>>>>> BLASTN 2.2.18 [Mar-02-2008] >>>>>>>> >>>>>>>> >>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>>> Schaffer, >>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>>>> search >>>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>>> >>>>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>>>>>> (24 letters) >>>>>>>> >>>>>>>> Database: hsa-mmu-rno_miRNA.fa >>>>>>>> 3524 sequences; 76,424 total letters >>>>>>>> >>>>>>>> Searching..................................................done >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Score >>>>>>>> E >>>>>>>> Sequences producing significant alignments: >>>>>>>> (bits) >>>>>>>> Value >>>>>>>> >>>>>>>> hsa-miR-548aa >>>>>>>> 48 2e-009 >>>>>>>> hsa-miR-548d-5p >>>>>>>> 36 9e-006 >>>>>>>> hsa-miR-548b-5p >>>>>>>> 36 9e-006 >>>>>>>> hsa-miR-548z >>>>>>>> 34 3e-005 >>>>>>>> hsa-miR-548q >>>>>>>> 30 5e-004 >>>>>>>> hsa-miR-548n >>>>>>>> 30 5e-004 >>>>>>>> hsa-miR-548ab >>>>>>>> 28 0.002 >>>>>>>> hsa-miR-548v >>>>>>>> 28 0.002 >>>>>>>> hsa-miR-548c-5p >>>>>>>> 28 0.002 >>>>>>>> hsa-miR-548ag >>>>>>>> 26 0.008 >>>>>>>> hsa-miR-548u >>>>>>>> 26 0.008 >>>>>>>> hsa-miR-548c-3p >>>>>>>> 26 0.008 >>>>>>>> hsa-miR-603 >>>>>>>> 26 0.008 >>>>>>>> hsa-miR-548a-3p >>>>>>>> 26 0.008 >>>>>>>> hsa-miR-548ac >>>>>>>> 24 0.033 >>>>>>>> hsa-miR-548an >>>>>>>> 22 0.13 >>>>>>>> hsa-miR-548aj >>>>>>>> 22 0.13 >>>>>>>> hsa-miR-548i >>>>>>>> 22 0.13 >>>>>>>> hsa-miR-548g >>>>>>>> 22 0.13 >>>>>>>> hsa-miR-548j >>>>>>>> 22 0.13 >>>>>>>> hsa-miR-548a-5p >>>>>>>> 22 0.13 >>>>>>>> >>>>>>>>> hsa-miR-548aa >>>>>>>> >>>>>>>> >>>>>>>> Length = 25 >>>>>>>> >>>>>>>> Score = 48.1 bits (24), Expect = 2e-009 >>>>>>>> Identities = 24/24 (100%) >>>>>>>> Strand = Plus / Minus >>>>>>>> >>>>>>>> >>>>>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>>>>>>> |||||||||||||||||||||||| >>>>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>>>>>> >>>>>>>> >>>>>>>>> hsa-miR-548d-5p >>>>>>>> >>>>>>>> >>>>>>>> Length = 22 >>>>>>>> >>>>>>>> Score = 36.2 bits (18), Expect = 9e-006 >>>>>>>> Identities = 18/18 (100%) >>>>>>>> Strand = Plus / Plus >>>>>>>> >>>>>>>> >>>>>>>> Query: 7 aaaagtaattgtggtttt 24 >>>>>>>> |||||||||||||||||| >>>>>>>> Sbjct: 1 aaaagtaattgtggtttt 18 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> in this result i could not parse my code. i think my code does not >>>>>>>> accept the Query header that is >>>>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>>>>>>> output. >>>>>>>> >>>>>>>> kindly help me out. >>>>>>>> >>>>>>>> with regards, >>>>>>>> Pawan. >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Pawan, >>>>>>>>> >>>>>>>>> Can you show your code? Is it basically following the structure >>>>>>>>> shown >>>>>>>>> in >>>>>>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>>>>>>>> ? >>>>>>>>> >>>>>>>>> If that is the case >>>>>>>>> >>>>>>>>> $hsp->strand('query') >>>>>>>>> >>>>>>>>> >>>>>>>>> is exactly what you need. >>>>>>>>> To check if hit and query are on different strands you can do: >>>>>>>>> >>>>>>>>> if ( $hsp->strand('query') >>>>>>>>> * $hsp->strand('hit') == -1){ >>>>>>>>> >>>>>>>>> # do whatever you need to do if they are on opposite strands >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> Hope that helps >>>>>>>>> >>>>>>>>> Frank >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> Using Bio::SearchIO module I am parsing the >>>>>>>>>> following >>>>>>>>>> Blast >>>>>>>>>> result. >>>>>>>>>> I have used the option- $hsp->strand('query'). >>>>>>>>>> >>>>>>>>>> But I cannot get detail of alignment. >>>>>>>>>> >>>>>>>>>> I need to know if my hit is forward (Strand = Plus / Plus) >>>>>>>>>> or reverse ( Strand = Plus / Minus)... >>>>>>>>>> Can anyone help me to get report as Plus or Minus for query or >>>>>>>>>> hit. >>>>>>>>>> >>>>>>>>>> thanks in advanced. >>>>>>>>>> >>>>>>>>>> With regards, >>>>>>>>>> Pawan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> BLASTN 2.2.18 [Dec-23-2011] >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>>>>> Schaffer, >>>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman >>>>>>>>>> (1997), >>>>>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>>>>>> search >>>>>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>>>>> >>>>>>>>>> Query= 000013_c10079-9984 >>>>>>>>>> (50 letters) >>>>>>>>>> >>>>>>>>>> Database: Cyano_Probe.fasta >>>>>>>>>> 4760 sequences; 238,000 total letters >>>>>>>>>> >>>>>>>>>> Searching..................................................done >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Score >>>>>>>>>> E >>>>>>>>>> Sequences producing significant alignments: >>>>>>>>>> (bits) >>>>>>>>>> Value >>>>>>>>>> >>>>>>>>>> 000013_c10079-9984 >>>>>>>>>> 100 7e-024 >>>>>>>>>> 002619_2689273-2690037 >>>>>>>>>> 24 0.36 >>>>>>>>>> 001126_c1123720-1123385 >>>>>>>>>> 24 0.36 >>>>>>>>>> 003211_c3326737-3326480 >>>>>>>>>> 22 1.4 >>>>>>>>>> 002415_2471082-2471420 >>>>>>>>>> 22 1.4 >>>>>>>>>> 002269_2321276-2322463 >>>>>>>>>> 22 1.4 >>>>>>>>>> 001328_c1326535-1326164 >>>>>>>>>> 22 1.4 >>>>>>>>>> >>>>>>>>>>> 000013_c10079-9984 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Length = 50 >>>>>>>>>> >>>>>>>>>> Score = 99.6 bits (50), Expect = 7e-024 >>>>>>>>>> Identities = 50/50 (100%) >>>>>>>>>> Strand = Plus / Plus >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Query: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>>>>>> Sbjct: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>>>> Limited, >>>>>>>>> a charity registered in England with number 1021457 and a company >>>>>>>>> registered >>>>>>>>> in England with number 2742969, whose registered office is 215 >>>>>>>>> Euston >>>>>>>>> Road, >>>>>>>>> London, NW1 2BE. >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>> Limited, a charity registered in England with number 1021457 and a >>>>>>> company registered in England with number 2742969, whose registered >>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>>> >>>>> >>>>> >>>>> -- >>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>> Limited, >>>>> a charity registered in England with number 1021457 and a company >>>>> registered >>>>> in England with number 2742969, whose registered office is 215 Euston >>>>> Road, >>>>> London, NW1 2BE. >>> >>> >>> >>> -- >>> The Wellcome Trust Sanger Institute is operated by Genome Research >>> Limited, >>> a charity registered in England with number 1021457 and a company >>> registered >>> in England with number 2742969, whose registered office is 215 Euston >>> Road, >>> London, NW1 2BE. > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > From fs5 at sanger.ac.uk Wed Jan 25 10:15:27 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 25 Jan 2012 15:15:27 +0000 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: References: <4F10A59E.5040807@sanger.ac.uk> <4F13EE93.2010502@sanger.ac.uk> <4F1443A8.6000007@sanger.ac.uk> <4F1E9956.9010700@sanger.ac.uk> <4F1FCD6F.70204@sanger.ac.uk> Message-ID: <4F201C8F.307@sanger.ac.uk> ah, sorry, my bad: the parameter name for the constructor appears to actually be just 'best' so you can try: my $searchio = new Bio::SearchIO( -format => 'blast', -file => $blast_report, -best => 1); but to set this flag later or ask if it is set you do for example print "analysing best hit only " if $searchio->best_hit_only; bit confusing but should work. Frank On 25/01/12 14:41, kakchingtabam pawankumar sharma wrote: > Hi, > > This is my script:- > > my $searchio = new Bio::SearchIO( -format => 'blast', -file => > $blast_report, -best_hit_only => 1); > > while ( my $result = $searchio->next_result() ) { > > my $QueryName = $result->query_name(); my $QueryDescript = > $result->query_description(); > my $QueryLength = $result->query_length; > my $NoHits = $result->num_hits; > > while( my $hit = $result->next_hit ) { > my $HitName = $hit->name(); > my $HitDescrip = $hit->description(); > my $HitLength = $hit->length; > my $Score = $hit->raw_score(); > my $Bits = $hit->bits; > > my $hsp = $hit->next_hsp; # Only check first (= best) hsp > my $Evalue = $hsp->evalue(); > my $AlnLen = $hsp->num_identical(); > my $TotalLen = $hsp->hsp_length; > my $QueryStrand = $hsp->strand('query'); > my $HitStrand = $hsp->strand('hit'); > > #if($Evalue< $cutoff){ > print "$QueryName $QueryDescript\t"; > print "$QueryLength\t"; > print "$NoHits\t"; > print "$HitName $HitDescrip\t"; > print "$HitLength\t"; > print "$Score\t"; > print "$Bits\t"; > print "$Evalue\n"; > print "$AlnLen\t"; > print "$TotalLen\t"; > print "$QueryStrand\t"; > print "$HitStrand\n"; > #} > } > #print "\n"; > } > > > So Can You Predict where I need to modify This Script To get only > tophit of every Query. > > With Regards, > Pawan > > On 1/25/12, Frank Schwach wrote: >> can you post your script please? >> >> On 25/01/12 07:12, kakchingtabam pawankumar sharma wrote: >>> Hi Frank, >>> Thanks for your kind reply. I have done this in my script even though >>> i could get the top hit still for every query. Still all hits are >>> extracted by my script. So kindly help me to solve this problem. >>> >>> With best regards, >>> Pawan >>> >>> On Tue, Jan 24, 2012 at 5:13 PM, Frank Schwach wrote: >>>> I think that's an option that you can set when you ask for a new BLAST >>>> parser: >>>> my $searchio = new Bio::SearchIO( >>>> -format => 'blast', >>>> -file => 't/data/ecolitst.bls', >>>> -best_hit_only => 1, >>>> ); >>>> >>>> now when you use the same script that you have been using so far (loop >>>> over >>>> all hits), there will only be one hit per result. >>>> >>>> Frank >>>> >>>> >>>> >>>> On 24/01/12 11:31, kakchingtabam pawankumar sharma wrote: >>>>> >>>>> Hi, >>>>> Thanks a lot for help Frank. It works for every Blast output. >>>>> One more question is that i want to best hit only(top hit of every >>>>> query). >>>>> I show there is option called >>>>> $obj->best_hit_only; in Bio::SearchIO module. >>>>> So help to add this to my script. >>>>> I could not do. Its confusing. >>>>> >>>>> Thanks in Advanced. >>>>> >>>>> With best regards, >>>>> Pawan >>>>> >>>>> >>>>> On Mon, Jan 16, 2012 at 9:05 PM, Frank Schwach >>>>> wrote: >>>>>> >>>>>> Excellent, well done! >>>>>> No, this is the way to do it. In BioPerl modules that use strand >>>>>> information >>>>>> you will find the values +1/-1 or undef. If you want to display those >>>>>> as >>>>>> PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy whatever, you have to convert >>>>>> it, >>>>>> but you know now how to do it. >>>>>> You have a syntax error in your code where you retrieve the query name: >>>>>> >>>>>> >>>>>> my $QueryName = $result->query_name(), my $QueryDescript = >>>>>> $result->query_description(); >>>>>> >>>>>> should be two lines and the comma should be a semicolon. >>>>>> >>>>>> Good luck! >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote: >>>>>>> >>>>>>> >>>>>>> So By using the if else conditon function, I have solve Frank. >>>>>>> I mean is there anyway in bioperl we can get directly using other >>>>>>> module! I hope u got it! >>>>>>> >>>>>>> >>>>>>> So my second Question have not replied that is >>>>>>> >>>>>>> i have blastn report as below: >>>>>>> >>>>>>> BLASTN 2.2.18 [Mar-02-2008] >>>>>>> >>>>>>> >>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>> Schaffer, >>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>>> search >>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>> >>>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>>>>> (24 letters) >>>>>>> >>>>>>> Database: hsa-mmu-rno_miRNA.fa >>>>>>> 3524 sequences; 76,424 total letters >>>>>>> >>>>>>> Searching..................................................done >>>>>>> >>>>>>> >>>>>>> >>>>>>> Score >>>>>>> E >>>>>>> Sequences producing significant alignments: >>>>>>> (bits) >>>>>>> Value >>>>>>> >>>>>>> hsa-miR-548aa >>>>>>> 48 2e-009 >>>>>>> hsa-miR-548d-5p >>>>>>> 36 9e-006 >>>>>>> hsa-miR-548b-5p >>>>>>> 36 9e-006 >>>>>>> hsa-miR-548z >>>>>>> 34 3e-005 >>>>>>> hsa-miR-548q >>>>>>> 30 5e-004 >>>>>>> hsa-miR-548n >>>>>>> 30 5e-004 >>>>>>> hsa-miR-548ab >>>>>>> 28 0.002 >>>>>>> hsa-miR-548v >>>>>>> 28 0.002 >>>>>>> hsa-miR-548c-5p >>>>>>> 28 0.002 >>>>>>> hsa-miR-548ag >>>>>>> 26 0.008 >>>>>>> hsa-miR-548u >>>>>>> 26 0.008 >>>>>>> hsa-miR-548c-3p >>>>>>> 26 0.008 >>>>>>> hsa-miR-603 >>>>>>> 26 0.008 >>>>>>> hsa-miR-548a-3p >>>>>>> 26 0.008 >>>>>>> hsa-miR-548ac >>>>>>> 24 0.033 >>>>>>> hsa-miR-548an >>>>>>> 22 0.13 >>>>>>> hsa-miR-548aj >>>>>>> 22 0.13 >>>>>>> hsa-miR-548i >>>>>>> 22 0.13 >>>>>>> hsa-miR-548g >>>>>>> 22 0.13 >>>>>>> hsa-miR-548j >>>>>>> 22 0.13 >>>>>>> hsa-miR-548a-5p >>>>>>> 22 0.13 >>>>>>> >>>>>>>> hsa-miR-548aa >>>>>>> >>>>>>> >>>>>>> Length = 25 >>>>>>> >>>>>>> Score = 48.1 bits (24), Expect = 2e-009 >>>>>>> Identities = 24/24 (100%) >>>>>>> Strand = Plus / Minus >>>>>>> >>>>>>> >>>>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>>>>>> |||||||||||||||||||||||| >>>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>>>>> >>>>>>> >>>>>>>> hsa-miR-548d-5p >>>>>>> >>>>>>> >>>>>>> Length = 22 >>>>>>> >>>>>>> Score = 36.2 bits (18), Expect = 9e-006 >>>>>>> Identities = 18/18 (100%) >>>>>>> Strand = Plus / Plus >>>>>>> >>>>>>> >>>>>>> Query: 7 aaaagtaattgtggtttt 24 >>>>>>> |||||||||||||||||| >>>>>>> Sbjct: 1 aaaagtaattgtggtttt 18 >>>>>>> >>>>>>> >>>>>>> >>>>>>> in this result i could not parse my code. i think my code does not >>>>>>> accept the Query header that is >>>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>>>>>> output. >>>>>>> >>>>>>> kindly help me out. >>>>>>> >>>>>>> with regards, >>>>>>> Pawan. >>>>>>> >>>>>>> On 1/16/12, Frank Schwach wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Pawan , >>>>>>>> >>>>>>>> Please always "reply to all", so that you keep the discussion on the >>>>>>>> bioperl mailing list and more people can help you. >>>>>>>> What you need is a very basic Perl command. I could give you the code >>>>>>>> but I think you get more out of it if you experiment with it on your >>>>>>>> own >>>>>>>> because it is very fundamental. I'll point you in the right >>>>>>>> direction: >>>>>>>> you want an if-then-else conditional construct. >>>>>>>> >>>>>>>> Perl's documentation about this is here: >>>>>>>> >>>>>>>> http://perldoc.perl.org/perlintro.html#Conditional-and-looping-constructs >>>>>>>> >>>>>>>> if strand is 1 you want to print "PLUS" else if it is -1 you want to >>>>>>>> print "MINUS", or else you might want to print "no strand" or >>>>>>>> something, >>>>>>>> or even treat it as an error and make the script abort. >>>>>>>> >>>>>>>> Give it a go and let us know if you need help. For basic (non-bio) >>>>>>>> Perl >>>>>>>> question, please also check out the community at >>>>>>>> http://www.perlmonks.org/. >>>>>>>> >>>>>>>> Hope that helps, >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi frank, >>>>>>>>> >>>>>>>>> Thanks for your kind reply. >>>>>>>>> I could get the vale for query as 1 value if it is plus. >>>>>>>>> and for hit = -1 if it is minus. >>>>>>>>> But i would like to print out as PLUS or MINUS not 1 or -1 my >>>>>>>>> friend. >>>>>>>>> >>>>>>>>> you can see my code as below: >>>>>>>>> >>>>>>>>> while ( my $result = $searchio->next_result() ) { >>>>>>>>> my $QueryName = $result->query_name(), my $QueryDescript = >>>>>>>>> $result->query_description(); >>>>>>>>> my $QueryLength = $result->query_length; >>>>>>>>> my $NoHits = $result->num_hits; >>>>>>>>> >>>>>>>>> while( my $hit = $result->next_hit ) { >>>>>>>>> my $HitName = $hit->name(); >>>>>>>>> my $HitDescrip = $hit->description(); >>>>>>>>> my $HitLength = $hit->length; >>>>>>>>> my $Score = $hit->raw_score(); >>>>>>>>> my $Bits = $hit->bits; >>>>>>>>> >>>>>>>>> my $hsp = $hit->next_hsp; # Only check first (= best) hsp >>>>>>>>> my $Evalue = $hsp->evalue(); >>>>>>>>> my $AlnLen = $hsp->num_identical(); >>>>>>>>> my $TotalLen = $hsp->hsp_length; >>>>>>>>> my $QueryStrand = $hsp->strand('query'); >>>>>>>>> my $HitStrand = $hsp->strand('hit'); >>>>>>>>> >>>>>>>>> if($Evalue< $cutoff){ >>>>>>>>> print "$QueryName $QueryDescript\t"; >>>>>>>>> print "$QueryLength\t"; >>>>>>>>> print "$NoHits\t"; >>>>>>>>> print "$HitName $HitDescrip\t"; >>>>>>>>> print "$HitLength\t"; >>>>>>>>> print "$Score\t"; >>>>>>>>> print "$Bits\t"; >>>>>>>>> print "$Evalue\t"; >>>>>>>>> print "$AlnLen\t"; >>>>>>>>> print "$TotalLen\t"; >>>>>>>>> print "$QueryStrand\t"; >>>>>>>>> print "$HitStrand\n"; >>>>>>>>> } >>>>>>>>> } >>>>>>>>> print "\n"; >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> This is a part of my code. >>>>>>>>> >>>>>>>>> i have blastn report as below: >>>>>>>>> >>>>>>>>> BLASTN 2.2.18 [Mar-02-2008] >>>>>>>>> >>>>>>>>> >>>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>>>> Schaffer, >>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >>>>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>>>>> search >>>>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>>>> >>>>>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>>>>>>> (24 letters) >>>>>>>>> >>>>>>>>> Database: hsa-mmu-rno_miRNA.fa >>>>>>>>> 3524 sequences; 76,424 total letters >>>>>>>>> >>>>>>>>> Searching..................................................done >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Score >>>>>>>>> E >>>>>>>>> Sequences producing significant alignments: >>>>>>>>> (bits) >>>>>>>>> Value >>>>>>>>> >>>>>>>>> hsa-miR-548aa >>>>>>>>> 48 2e-009 >>>>>>>>> hsa-miR-548d-5p >>>>>>>>> 36 9e-006 >>>>>>>>> hsa-miR-548b-5p >>>>>>>>> 36 9e-006 >>>>>>>>> hsa-miR-548z >>>>>>>>> 34 3e-005 >>>>>>>>> hsa-miR-548q >>>>>>>>> 30 5e-004 >>>>>>>>> hsa-miR-548n >>>>>>>>> 30 5e-004 >>>>>>>>> hsa-miR-548ab >>>>>>>>> 28 0.002 >>>>>>>>> hsa-miR-548v >>>>>>>>> 28 0.002 >>>>>>>>> hsa-miR-548c-5p >>>>>>>>> 28 0.002 >>>>>>>>> hsa-miR-548ag >>>>>>>>> 26 0.008 >>>>>>>>> hsa-miR-548u >>>>>>>>> 26 0.008 >>>>>>>>> hsa-miR-548c-3p >>>>>>>>> 26 0.008 >>>>>>>>> hsa-miR-603 >>>>>>>>> 26 0.008 >>>>>>>>> hsa-miR-548a-3p >>>>>>>>> 26 0.008 >>>>>>>>> hsa-miR-548ac >>>>>>>>> 24 0.033 >>>>>>>>> hsa-miR-548an >>>>>>>>> 22 0.13 >>>>>>>>> hsa-miR-548aj >>>>>>>>> 22 0.13 >>>>>>>>> hsa-miR-548i >>>>>>>>> 22 0.13 >>>>>>>>> hsa-miR-548g >>>>>>>>> 22 0.13 >>>>>>>>> hsa-miR-548j >>>>>>>>> 22 0.13 >>>>>>>>> hsa-miR-548a-5p >>>>>>>>> 22 0.13 >>>>>>>>> >>>>>>>>>> hsa-miR-548aa >>>>>>>>> >>>>>>>>> >>>>>>>>> Length = 25 >>>>>>>>> >>>>>>>>> Score = 48.1 bits (24), Expect = 2e-009 >>>>>>>>> Identities = 24/24 (100%) >>>>>>>>> Strand = Plus / Minus >>>>>>>>> >>>>>>>>> >>>>>>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>>>>>>>> |||||||||||||||||||||||| >>>>>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>>>>>>> >>>>>>>>> >>>>>>>>>> hsa-miR-548d-5p >>>>>>>>> >>>>>>>>> >>>>>>>>> Length = 22 >>>>>>>>> >>>>>>>>> Score = 36.2 bits (18), Expect = 9e-006 >>>>>>>>> Identities = 18/18 (100%) >>>>>>>>> Strand = Plus / Plus >>>>>>>>> >>>>>>>>> >>>>>>>>> Query: 7 aaaagtaattgtggtttt 24 >>>>>>>>> |||||||||||||||||| >>>>>>>>> Sbjct: 1 aaaagtaattgtggtttt 18 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> in this result i could not parse my code. i think my code does not >>>>>>>>> accept the Query header that is >>>>>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example blast >>>>>>>>> output. >>>>>>>>> >>>>>>>>> kindly help me out. >>>>>>>>> >>>>>>>>> with regards, >>>>>>>>> Pawan. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Jan 14, 2012 at 3:13 AM, Frank Schwach >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Pawan, >>>>>>>>>> >>>>>>>>>> Can you show your code? Is it basically following the structure >>>>>>>>>> shown >>>>>>>>>> in >>>>>>>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>>>>>>>>> ? >>>>>>>>>> >>>>>>>>>> If that is the case >>>>>>>>>> >>>>>>>>>> $hsp->strand('query') >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> is exactly what you need. >>>>>>>>>> To check if hit and query are on different strands you can do: >>>>>>>>>> >>>>>>>>>> if ( $hsp->strand('query') >>>>>>>>>> * $hsp->strand('hit') == -1){ >>>>>>>>>> >>>>>>>>>> # do whatever you need to do if they are on opposite strands >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Hope that helps >>>>>>>>>> >>>>>>>>>> Frank >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> Using Bio::SearchIO module I am parsing the >>>>>>>>>>> following >>>>>>>>>>> Blast >>>>>>>>>>> result. >>>>>>>>>>> I have used the option- $hsp->strand('query'). >>>>>>>>>>> >>>>>>>>>>> But I cannot get detail of alignment. >>>>>>>>>>> >>>>>>>>>>> I need to know if my hit is forward (Strand = Plus / Plus) >>>>>>>>>>> or reverse ( Strand = Plus / Minus)... >>>>>>>>>>> Can anyone help me to get report as Plus or Minus for query or >>>>>>>>>>> hit. >>>>>>>>>>> >>>>>>>>>>> thanks in advanced. >>>>>>>>>>> >>>>>>>>>>> With regards, >>>>>>>>>>> Pawan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> BLASTN 2.2.18 [Dec-23-2011] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>>>>>> Schaffer, >>>>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman >>>>>>>>>>> (1997), >>>>>>>>>>> "Gapped BLAST and PSI-BLAST: a new generation of protein database >>>>>>>>>>> search >>>>>>>>>>> programs", Nucleic Acids Res. 25:3389-3402. >>>>>>>>>>> >>>>>>>>>>> Query= 000013_c10079-9984 >>>>>>>>>>> (50 letters) >>>>>>>>>>> >>>>>>>>>>> Database: Cyano_Probe.fasta >>>>>>>>>>> 4760 sequences; 238,000 total letters >>>>>>>>>>> >>>>>>>>>>> Searching..................................................done >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Score >>>>>>>>>>> E >>>>>>>>>>> Sequences producing significant alignments: >>>>>>>>>>> (bits) >>>>>>>>>>> Value >>>>>>>>>>> >>>>>>>>>>> 000013_c10079-9984 >>>>>>>>>>> 100 7e-024 >>>>>>>>>>> 002619_2689273-2690037 >>>>>>>>>>> 24 0.36 >>>>>>>>>>> 001126_c1123720-1123385 >>>>>>>>>>> 24 0.36 >>>>>>>>>>> 003211_c3326737-3326480 >>>>>>>>>>> 22 1.4 >>>>>>>>>>> 002415_2471082-2471420 >>>>>>>>>>> 22 1.4 >>>>>>>>>>> 002269_2321276-2322463 >>>>>>>>>>> 22 1.4 >>>>>>>>>>> 001328_c1326535-1326164 >>>>>>>>>>> 22 1.4 >>>>>>>>>>> >>>>>>>>>>>> 000013_c10079-9984 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Length = 50 >>>>>>>>>>> >>>>>>>>>>> Score = 99.6 bits (50), Expect = 7e-024 >>>>>>>>>>> Identities = 50/50 (100%) >>>>>>>>>>> Strand = Plus / Plus >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Query: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>>>>>>> Sbjct: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>>>>> Limited, >>>>>>>>>> a charity registered in England with number 1021457 and a company >>>>>>>>>> registered >>>>>>>>>> in England with number 2742969, whose registered office is 215 >>>>>>>>>> Euston >>>>>>>>>> Road, >>>>>>>>>> London, NW1 2BE. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>>> Limited, a charity registered in England with number 1021457 and a >>>>>>>> company registered in England with number 2742969, whose registered >>>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>> Limited, >>>>>> a charity registered in England with number 1021457 and a company >>>>>> registered >>>>>> in England with number 2742969, whose registered office is 215 Euston >>>>>> Road, >>>>>> London, NW1 2BE. >>>> >>>> >>>> >>>> -- >>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>> Limited, >>>> a charity registered in England with number 1021457 and a company >>>> registered >>>> in England with number 2742969, whose registered office is 215 Euston >>>> Road, >>>> London, NW1 2BE. >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From fs5 at sanger.ac.uk Wed Jan 25 13:08:10 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Wed, 25 Jan 2012 18:08:10 +0000 Subject: [Bioperl-l] additional methods for Bio::SeqUtils for in-silico cloning In-Reply-To: <4F183A6A.4020409@sanger.ac.uk> References: <1326201010.4396.75.camel@deskpro15336.internal.sanger.ac.uk> <4F0C5855.1070907@gmail.com> <1326214061.4396.98.camel@deskpro15336.internal.sanger.ac.uk> <4F0C74E9.6020804@gmail.com> <4F0C75DA.9050607@gmail.com> <4F0DD215.9070100@sanger.ac.uk> <4F0DD72A.90309@gmail.com> <4F0DF93B.5020505@sanger.ac.uk> <1326377597.4396.125.camel@deskpro15336.internal.sanger.ac.uk> <1326903677.3563.15.camel@deskpro15336.internal.sanger.ac.uk> <4F16F82C.2090503@gmail.com> <1326906496.3563.31.camel@deskpro15336.internal.sanger.ac.uk> <159E3D6F-F08C-4D93-B31F-742AF93EF20E@illinois.edu> <1326908773.3563.36.camel@deskpro15336.internal.sanger.ac.uk> <4F17FB50.4000606@sanger.ac.uk> <4F18337B.2010401@gmail.com> <9596B4F5-6FD0-48B6-9E46-70C0EE3E5738@illinois.edu> <4F183A6A.4020409@sanger.ac.uk> Message-ID: <4F20450A.3060103@sanger.ac.uk> just wanted to say that I have updated Bio::SeqUtils in my git master branch so that it now throws an exception if negative coordinates are encountered. There is also a bugfix for incorrectly adjusted positions following deletions in some situations. I have made a new pull request to bioperl-live. Thanks Roy and Chris! Frank On 19/01/12 15:44, Frank Schwach wrote: > ok, then let's throw an error when negative postions are supplied. I'll > make the changes to the queued pull request. > > Cheers, > > Frank > > > On 19/01/12 15:39, Fields, Christopher J wrote: >> On Jan 19, 2012, at 9:15 AM, Roy Chaudhuri wrote: >> >>> I'm not sure I understand the problem with "<1",< means "less than" >>> not "less than or equal to", so it does not imply that the feature >>> could start at position 1. >> >> Yup, that's correct. My bad. >> >>> I can see that there would be cases where negative coordinates might >>> be useful, but I think it is opening a can of worms and could >>> introduce many subtle bugs, so I'd vote for throwing an error. If you >>> were to do it, it would be better to stick with the biological >>> convention of -1 being the base before 1 (as used for -10 and -35 >>> elements). >>> >>> Roy. >> >> Agree with Roy, I think it's best to avoid using negative cords when >> at all possible, mainly b/c it introduces possibly inconsistent >> behavior. Such behavior should be defined in the feature class or its >> parent class(es), wherever appropriate. At the moment that doesn't >> hold true. >> >> As a side note, I recall Lincoln allowed negative coords with GBrowse >> features but I don't recall whether it's officially supported. >> >> chris > > -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From fossandonc at hotmail.com Wed Jan 25 14:41:33 2012 From: fossandonc at hotmail.com (=?iso-8859-1?Q?Francisco_J._Ossand=F3n?=) Date: Wed, 25 Jan 2012 16:41:33 -0300 Subject: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. In-Reply-To: <4F201C8F.307@sanger.ac.uk> References: <4F10A59E.5040807@sanger.ac.uk> <4F13EE93.2010502@sanger.ac.uk> <4F1443A8.6000007@sanger.ac.uk> <4F1E9956.9010700@sanger.ac.uk> <4F1FCD6F.70204@sanger.ac.uk> <4F201C8F.307@sanger.ac.uk> Message-ID: Hello, Is possible to get just the best hit using a Label in the Results cycle and then using a 'next LABEL' command. http://perldoc.perl.org/perlsyn.html#Loop-Control http://perldoc.perl.org/functions/next.html Below is the script with the added Label, tested and working. I added back the $hsp loop so you can optionally see everything again just commenting out the "next RESULT;" line. ##### #!/usr/bin/perl use strict; use warnings; use Bio::SearchIO; my $blast_report = 'blast.out.txt'; my $searchio = Bio::SearchIO->new(-file => $blast_report, -format => 'blast'); RESULT: while ( my $result = $searchio->next_result ) { my $QueryName = $result->query_name; my $QueryDescript = $result->query_description; my $QueryLength = $result->query_length; my $NoHits = $result->num_hits; while( my $hit = $result->next_hit ) { my $HitName = $hit->name; my $HitDescrip = $hit->description; my $HitLength = $hit->length; my $Score = $hit->raw_score; my $Bits = $hit->bits; while( my $hsp = $hit->next_hsp ) { my $Evalue = $hsp->evalue; my $AlnLen = $hsp->num_identical; my $TotalLen = $hsp->hsp_length; my $QueryStrand = $hsp->strand('query'); my $HitStrand = $hsp->strand('hit'); #if($Evalue < $cutoff){ print "$QueryName $QueryDescript\t"; print "$QueryLength\t"; print "$NoHits\t"; print "$HitName $HitDescrip\t"; print "$HitLength\t"; print "$Score\t"; print "$Bits\t"; print "$Evalue\n"; print "$AlnLen\t"; print "$TotalLen\t"; print "$QueryStrand\t"; print "$HitStrand\n"; #} next RESULT; } } #print "\n"; } exit; ##### Cheers, Francisco J. Ossandon -----Mensaje original----- De: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] En nombre de Frank Schwach Enviado el: mi?rcoles, 25 de enero de 2012 12:15 Para: kakchingtabam pawankumar sharma CC: Asunto: Re: [Bioperl-l] how to get the information of Strand = Plus / Plus from blastn report by bioperl. ah, sorry, my bad: the parameter name for the constructor appears to actually be just 'best' so you can try: my $searchio = new Bio::SearchIO( -format => 'blast', -file => $blast_report, -best => 1); but to set this flag later or ask if it is set you do for example print "analysing best hit only " if $searchio->best_hit_only; bit confusing but should work. Frank On 25/01/12 14:41, kakchingtabam pawankumar sharma wrote: > Hi, > > This is my script:- > > my $searchio = new Bio::SearchIO( -format => 'blast', -file => > $blast_report, -best_hit_only => 1); > > while ( my $result = $searchio->next_result() ) { > > my $QueryName = $result->query_name(); my $QueryDescript = > $result->query_description(); > my $QueryLength = $result->query_length; > my $NoHits = $result->num_hits; > > while( my $hit = $result->next_hit ) { > my $HitName = $hit->name(); > my $HitDescrip = $hit->description(); > my $HitLength = $hit->length; > my $Score = $hit->raw_score(); > my $Bits = $hit->bits; > > my $hsp = $hit->next_hsp; # Only check first (= best) hsp > my $Evalue = $hsp->evalue(); > my $AlnLen = $hsp->num_identical(); > my $TotalLen = $hsp->hsp_length; > my $QueryStrand = $hsp->strand('query'); > my $HitStrand = $hsp->strand('hit'); > > #if($Evalue< $cutoff){ > print "$QueryName $QueryDescript\t"; > print "$QueryLength\t"; > print "$NoHits\t"; > print "$HitName $HitDescrip\t"; > print "$HitLength\t"; > print "$Score\t"; > print "$Bits\t"; > print "$Evalue\n"; > print "$AlnLen\t"; > print "$TotalLen\t"; > print "$QueryStrand\t"; > print "$HitStrand\n"; > #} > } > #print "\n"; > } > > > So Can You Predict where I need to modify This Script To get only > tophit of every Query. > > With Regards, > Pawan > > On 1/25/12, Frank Schwach wrote: >> can you post your script please? >> >> On 25/01/12 07:12, kakchingtabam pawankumar sharma wrote: >>> Hi Frank, >>> Thanks for your kind reply. I have done this in my script even >>> though i could get the top hit still for every query. Still all hits >>> are extracted by my script. So kindly help me to solve this problem. >>> >>> With best regards, >>> Pawan >>> >>> On Tue, Jan 24, 2012 at 5:13 PM, Frank Schwach wrote: >>>> I think that's an option that you can set when you ask for a new >>>> BLAST >>>> parser: >>>> my $searchio = new Bio::SearchIO( >>>> -format => 'blast', >>>> -file => 't/data/ecolitst.bls', >>>> -best_hit_only => 1, >>>> ); >>>> >>>> now when you use the same script that you have been using so far >>>> (loop over all hits), there will only be one hit per result. >>>> >>>> Frank >>>> >>>> >>>> >>>> On 24/01/12 11:31, kakchingtabam pawankumar sharma wrote: >>>>> >>>>> Hi, >>>>> Thanks a lot for help Frank. It works for every Blast output. >>>>> One more question is that i want to best hit only(top hit of every >>>>> query). >>>>> I show there is option called >>>>> $obj->best_hit_only; in Bio::SearchIO module. >>>>> So help to add this to my script. >>>>> I could not do. Its confusing. >>>>> >>>>> Thanks in Advanced. >>>>> >>>>> With best regards, >>>>> Pawan >>>>> >>>>> >>>>> On Mon, Jan 16, 2012 at 9:05 PM, Frank Schwach >>>>> wrote: >>>>>> >>>>>> Excellent, well done! >>>>>> No, this is the way to do it. In BioPerl modules that use strand >>>>>> information you will find the values +1/-1 or undef. If you want >>>>>> to display those as PLUS/MINUS,+/-,Watson/Crick,Laurel/Hardy >>>>>> whatever, you have to convert it, but you know now how to do it. >>>>>> You have a syntax error in your code where you retrieve the query name: >>>>>> >>>>>> >>>>>> my $QueryName = $result->query_name(), my $QueryDescript = >>>>>> $result->query_description(); >>>>>> >>>>>> should be two lines and the comma should be a semicolon. >>>>>> >>>>>> Good luck! >>>>>> >>>>>> Frank >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 16/01/12 15:14, kakchingtabam pawankumar sharma wrote: >>>>>>> >>>>>>> >>>>>>> So By using the if else conditon function, I have solve Frank. >>>>>>> I mean is there anyway in bioperl we can get directly using >>>>>>> other module! I hope u got it! >>>>>>> >>>>>>> >>>>>>> So my second Question have not replied that is >>>>>>> >>>>>>> i have blastn report as below: >>>>>>> >>>>>>> BLASTN 2.2.18 [Mar-02-2008] >>>>>>> >>>>>>> >>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>> Schaffer, >>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman >>>>>>> (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein >>>>>>> database search programs", Nucleic Acids Res. 25:3389-3402. >>>>>>> >>>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>>>>> (24 letters) >>>>>>> >>>>>>> Database: hsa-mmu-rno_miRNA.fa >>>>>>> 3524 sequences; 76,424 total letters >>>>>>> >>>>>>> Searching..................................................done >>>>>>> >>>>>>> >>>>>>> >>>>>>> Score >>>>>>> E >>>>>>> Sequences producing significant alignments: >>>>>>> (bits) >>>>>>> Value >>>>>>> >>>>>>> hsa-miR-548aa >>>>>>> 48 2e-009 >>>>>>> hsa-miR-548d-5p >>>>>>> 36 9e-006 >>>>>>> hsa-miR-548b-5p >>>>>>> 36 9e-006 >>>>>>> hsa-miR-548z >>>>>>> 34 3e-005 >>>>>>> hsa-miR-548q >>>>>>> 30 5e-004 >>>>>>> hsa-miR-548n >>>>>>> 30 5e-004 >>>>>>> hsa-miR-548ab >>>>>>> 28 0.002 >>>>>>> hsa-miR-548v >>>>>>> 28 0.002 >>>>>>> hsa-miR-548c-5p >>>>>>> 28 0.002 >>>>>>> hsa-miR-548ag >>>>>>> 26 0.008 >>>>>>> hsa-miR-548u >>>>>>> 26 0.008 >>>>>>> hsa-miR-548c-3p >>>>>>> 26 0.008 >>>>>>> hsa-miR-603 >>>>>>> 26 0.008 >>>>>>> hsa-miR-548a-3p >>>>>>> 26 0.008 >>>>>>> hsa-miR-548ac >>>>>>> 24 0.033 >>>>>>> hsa-miR-548an >>>>>>> 22 0.13 >>>>>>> hsa-miR-548aj >>>>>>> 22 0.13 >>>>>>> hsa-miR-548i >>>>>>> 22 0.13 >>>>>>> hsa-miR-548g >>>>>>> 22 0.13 >>>>>>> hsa-miR-548j >>>>>>> 22 0.13 >>>>>>> hsa-miR-548a-5p >>>>>>> 22 0.13 >>>>>>> >>>>>>>> hsa-miR-548aa >>>>>>> >>>>>>> >>>>>>> Length = 25 >>>>>>> >>>>>>> Score = 48.1 bits (24), Expect = 2e-009 >>>>>>> Identities = 24/24 (100%) >>>>>>> Strand = Plus / Minus >>>>>>> >>>>>>> >>>>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>>>>>> |||||||||||||||||||||||| >>>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>>>>> >>>>>>> >>>>>>>> hsa-miR-548d-5p >>>>>>> >>>>>>> >>>>>>> Length = 22 >>>>>>> >>>>>>> Score = 36.2 bits (18), Expect = 9e-006 >>>>>>> Identities = 18/18 (100%) >>>>>>> Strand = Plus / Plus >>>>>>> >>>>>>> >>>>>>> Query: 7 aaaagtaattgtggtttt 24 >>>>>>> |||||||||||||||||| >>>>>>> Sbjct: 1 aaaagtaattgtggtttt 18 >>>>>>> >>>>>>> >>>>>>> >>>>>>> in this result i could not parse my code. i think my code does >>>>>>> not accept the Query header that is >>>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example >>>>>>> blast output. >>>>>>> >>>>>>> kindly help me out. >>>>>>> >>>>>>> with regards, >>>>>>> Pawan. >>>>>>> >>>>>>> On 1/16/12, Frank Schwach wrote: >>>>>>>> >>>>>>>> >>>>>>>> Hi Pawan , >>>>>>>> >>>>>>>> Please always "reply to all", so that you keep the discussion >>>>>>>> on the bioperl mailing list and more people can help you. >>>>>>>> What you need is a very basic Perl command. I could give you >>>>>>>> the code but I think you get more out of it if you experiment >>>>>>>> with it on your own because it is very fundamental. I'll point >>>>>>>> you in the right >>>>>>>> direction: >>>>>>>> you want an if-then-else conditional construct. >>>>>>>> >>>>>>>> Perl's documentation about this is here: >>>>>>>> >>>>>>>> http://perldoc.perl.org/perlintro.html#Conditional-and-looping- >>>>>>>> constructs >>>>>>>> >>>>>>>> if strand is 1 you want to print "PLUS" else if it is -1 you >>>>>>>> want to print "MINUS", or else you might want to print "no >>>>>>>> strand" or something, or even treat it as an error and make the >>>>>>>> script abort. >>>>>>>> >>>>>>>> Give it a go and let us know if you need help. For basic >>>>>>>> (non-bio) Perl question, please also check out the community at >>>>>>>> http://www.perlmonks.org/. >>>>>>>> >>>>>>>> Hope that helps, >>>>>>>> >>>>>>>> Frank >>>>>>>> >>>>>>>> >>>>>>>> On 14/01/12 05:59, kakchingtabam pawankumar sharma wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi frank, >>>>>>>>> >>>>>>>>> Thanks for your kind reply. >>>>>>>>> I could get the vale for query as 1 value if it is plus. >>>>>>>>> and for hit = -1 if it is minus. >>>>>>>>> But i would like to print out as PLUS or MINUS not 1 or -1 my >>>>>>>>> friend. >>>>>>>>> >>>>>>>>> you can see my code as below: >>>>>>>>> >>>>>>>>> while ( my $result = $searchio->next_result() ) { >>>>>>>>> my $QueryName = $result->query_name(), my >>>>>>>>> $QueryDescript = $result->query_description(); >>>>>>>>> my $QueryLength = $result->query_length; >>>>>>>>> my $NoHits = $result->num_hits; >>>>>>>>> >>>>>>>>> while( my $hit = $result->next_hit ) { >>>>>>>>> my $HitName = $hit->name(); >>>>>>>>> my $HitDescrip = $hit->description(); >>>>>>>>> my $HitLength = $hit->length; >>>>>>>>> my $Score = $hit->raw_score(); >>>>>>>>> my $Bits = $hit->bits; >>>>>>>>> >>>>>>>>> my $hsp = $hit->next_hsp; # Only check first (= best) hsp >>>>>>>>> my $Evalue = $hsp->evalue(); >>>>>>>>> my $AlnLen = $hsp->num_identical(); >>>>>>>>> my $TotalLen = $hsp->hsp_length; >>>>>>>>> my $QueryStrand = $hsp->strand('query'); >>>>>>>>> my $HitStrand = $hsp->strand('hit'); >>>>>>>>> >>>>>>>>> if($Evalue< $cutoff){ >>>>>>>>> print "$QueryName $QueryDescript\t"; >>>>>>>>> print "$QueryLength\t"; >>>>>>>>> print "$NoHits\t"; >>>>>>>>> print "$HitName $HitDescrip\t"; >>>>>>>>> print "$HitLength\t"; >>>>>>>>> print "$Score\t"; >>>>>>>>> print "$Bits\t"; >>>>>>>>> print "$Evalue\t"; >>>>>>>>> print "$AlnLen\t"; >>>>>>>>> print "$TotalLen\t"; >>>>>>>>> print "$QueryStrand\t"; >>>>>>>>> print "$HitStrand\n"; >>>>>>>>> } >>>>>>>>> } >>>>>>>>> print "\n"; >>>>>>>>> } >>>>>>>>> >>>>>>>>> >>>>>>>>> This is a part of my code. >>>>>>>>> >>>>>>>>> i have blastn report as below: >>>>>>>>> >>>>>>>>> BLASTN 2.2.18 [Mar-02-2008] >>>>>>>>> >>>>>>>>> >>>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>>>> Schaffer, >>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman >>>>>>>>> (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>>>>>> protein database search programs", Nucleic Acids Res. >>>>>>>>> 25:3389-3402. >>>>>>>>> >>>>>>>>> Query= ORB_1210001_hsa-miR-548aa#5_1 >>>>>>>>> (24 letters) >>>>>>>>> >>>>>>>>> Database: hsa-mmu-rno_miRNA.fa >>>>>>>>> 3524 sequences; 76,424 total letters >>>>>>>>> >>>>>>>>> Searching..................................................don >>>>>>>>> e >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Score >>>>>>>>> E >>>>>>>>> Sequences producing significant alignments: >>>>>>>>> (bits) >>>>>>>>> Value >>>>>>>>> >>>>>>>>> hsa-miR-548aa >>>>>>>>> 48 2e-009 >>>>>>>>> hsa-miR-548d-5p >>>>>>>>> 36 9e-006 >>>>>>>>> hsa-miR-548b-5p >>>>>>>>> 36 9e-006 >>>>>>>>> hsa-miR-548z >>>>>>>>> 34 3e-005 >>>>>>>>> hsa-miR-548q >>>>>>>>> 30 5e-004 >>>>>>>>> hsa-miR-548n >>>>>>>>> 30 5e-004 >>>>>>>>> hsa-miR-548ab >>>>>>>>> 28 0.002 >>>>>>>>> hsa-miR-548v >>>>>>>>> 28 0.002 >>>>>>>>> hsa-miR-548c-5p >>>>>>>>> 28 0.002 >>>>>>>>> hsa-miR-548ag >>>>>>>>> 26 0.008 >>>>>>>>> hsa-miR-548u >>>>>>>>> 26 0.008 >>>>>>>>> hsa-miR-548c-3p >>>>>>>>> 26 0.008 >>>>>>>>> hsa-miR-603 >>>>>>>>> 26 0.008 >>>>>>>>> hsa-miR-548a-3p >>>>>>>>> 26 0.008 >>>>>>>>> hsa-miR-548ac >>>>>>>>> 24 0.033 >>>>>>>>> hsa-miR-548an >>>>>>>>> 22 0.13 >>>>>>>>> hsa-miR-548aj >>>>>>>>> 22 0.13 >>>>>>>>> hsa-miR-548i >>>>>>>>> 22 0.13 >>>>>>>>> hsa-miR-548g >>>>>>>>> 22 0.13 >>>>>>>>> hsa-miR-548j >>>>>>>>> 22 0.13 >>>>>>>>> hsa-miR-548a-5p >>>>>>>>> 22 0.13 >>>>>>>>> >>>>>>>>>> hsa-miR-548aa >>>>>>>>> >>>>>>>>> >>>>>>>>> Length = 25 >>>>>>>>> >>>>>>>>> Score = 48.1 bits (24), Expect = 2e-009 >>>>>>>>> Identities = 24/24 (100%) >>>>>>>>> Strand = Plus / Minus >>>>>>>>> >>>>>>>>> >>>>>>>>> Query: 1 tggtgcaaaagtaattgtggtttt 24 >>>>>>>>> |||||||||||||||||||||||| >>>>>>>>> Sbjct: 25 tggtgcaaaagtaattgtggtttt 2 >>>>>>>>> >>>>>>>>> >>>>>>>>>> hsa-miR-548d-5p >>>>>>>>> >>>>>>>>> >>>>>>>>> Length = 22 >>>>>>>>> >>>>>>>>> Score = 36.2 bits (18), Expect = 9e-006 >>>>>>>>> Identities = 18/18 (100%) >>>>>>>>> Strand = Plus / Plus >>>>>>>>> >>>>>>>>> >>>>>>>>> Query: 7 aaaagtaattgtggtttt 24 >>>>>>>>> |||||||||||||||||| >>>>>>>>> Sbjct: 1 aaaagtaattgtggtttt 18 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> in this result i could not parse my code. i think my code does >>>>>>>>> not accept the Query header that is >>>>>>>>> "ORB_1210001_hsa-miR-548aa#5_1" as it is in the above example >>>>>>>>> blast output. >>>>>>>>> >>>>>>>>> kindly help me out. >>>>>>>>> >>>>>>>>> with regards, >>>>>>>>> Pawan. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Jan 14, 2012 at 3:13 AM, Frank >>>>>>>>> Schwach >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Pawan, >>>>>>>>>> >>>>>>>>>> Can you show your code? Is it basically following the >>>>>>>>>> structure shown in >>>>>>>>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO >>>>>>>>>> ? >>>>>>>>>> >>>>>>>>>> If that is the case >>>>>>>>>> >>>>>>>>>> $hsp->strand('query') >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> is exactly what you need. >>>>>>>>>> To check if hit and query are on different strands you can do: >>>>>>>>>> >>>>>>>>>> if ( $hsp->strand('query') >>>>>>>>>> * $hsp->strand('hit') == -1){ >>>>>>>>>> >>>>>>>>>> # do whatever you need to do if they are on opposite >>>>>>>>>> strands >>>>>>>>>> >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> Hope that helps >>>>>>>>>> >>>>>>>>>> Frank >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 13/01/12 16:46, kakchingtabam pawankumar sharma wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> Using Bio::SearchIO module I am parsing the >>>>>>>>>>> following Blast result. >>>>>>>>>>> I have used the option- $hsp->strand('query'). >>>>>>>>>>> >>>>>>>>>>> But I cannot get detail of alignment. >>>>>>>>>>> >>>>>>>>>>> I need to know if my hit is forward (Strand = Plus / Plus) >>>>>>>>>>> or reverse ( Strand = Plus / Minus)... >>>>>>>>>>> Can anyone help me to get report as Plus or Minus for >>>>>>>>>>> query or hit. >>>>>>>>>>> >>>>>>>>>>> thanks in advanced. >>>>>>>>>>> >>>>>>>>>>> With regards, >>>>>>>>>>> Pawan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> BLASTN 2.2.18 [Dec-23-2011] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>>>>>>> Schaffer, >>>>>>>>>>> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman >>>>>>>>>>> (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>>>>>>>> protein database search programs", Nucleic Acids Res. >>>>>>>>>>> 25:3389-3402. >>>>>>>>>>> >>>>>>>>>>> Query= 000013_c10079-9984 >>>>>>>>>>> (50 letters) >>>>>>>>>>> >>>>>>>>>>> Database: Cyano_Probe.fasta >>>>>>>>>>> 4760 sequences; 238,000 total letters >>>>>>>>>>> >>>>>>>>>>> Searching..................................................d >>>>>>>>>>> one >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Score >>>>>>>>>>> E >>>>>>>>>>> Sequences producing significant alignments: >>>>>>>>>>> (bits) >>>>>>>>>>> Value >>>>>>>>>>> >>>>>>>>>>> 000013_c10079-9984 >>>>>>>>>>> 100 7e-024 >>>>>>>>>>> 002619_2689273-2690037 >>>>>>>>>>> 24 0.36 >>>>>>>>>>> 001126_c1123720-1123385 >>>>>>>>>>> 24 0.36 >>>>>>>>>>> 003211_c3326737-3326480 >>>>>>>>>>> 22 1.4 >>>>>>>>>>> 002415_2471082-2471420 >>>>>>>>>>> 22 1.4 >>>>>>>>>>> 002269_2321276-2322463 >>>>>>>>>>> 22 1.4 >>>>>>>>>>> 001328_c1326535-1326164 >>>>>>>>>>> 22 1.4 >>>>>>>>>>> >>>>>>>>>>>> 000013_c10079-9984 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Length = 50 >>>>>>>>>>> >>>>>>>>>>> Score = 99.6 bits (50), Expect = 7e-024 >>>>>>>>>>> Identities = 50/50 (100%) >>>>>>>>>>> Strand = Plus / Plus >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Query: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca 50 >>>>>>>>>>> >>>>>>>>>>> |||||||||||||||||||||||||||||||||||||||||||||||||| >>>>>>>>>>> Sbjct: 1 agtcaacaccaatctgagtttaatcactatcttgatcatgttagatatca >>>>>>>>>>> 50 >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome >>>>>>>>>> Research Limited, a charity registered in England with number >>>>>>>>>> 1021457 and a company registered in England with number >>>>>>>>>> 2742969, whose registered office is 215 Euston Road, London, >>>>>>>>>> NW1 2BE. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>>>>>> Limited, a charity registered in England with number 1021457 and a >>>>>>>> company registered in England with number 2742969, whose registered >>>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> The Wellcome Trust Sanger Institute is operated by Genome >>>>>> Research Limited, a charity registered in England with number >>>>>> 1021457 and a company registered in England with number 2742969, >>>>>> whose registered office is 215 Euston Road, London, NW1 2BE. >>>> >>>> >>>> >>>> -- >>>> The Wellcome Trust Sanger Institute is operated by Genome Research >>>> Limited, a charity registered in England with number 1021457 and a >>>> company registered in England with number 2742969, whose registered >>>> office is 215 Euston Road, London, NW1 2BE. >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome Research >> Limited, a charity registered in England with number 1021457 and a >> company registered in England with number 2742969, whose registered >> office is 215 Euston Road, London, NW1 2BE. >> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From jordi.durban at gmail.com Thu Jan 26 05:52:31 2012 From: jordi.durban at gmail.com (Jordi Durban) Date: Thu, 26 Jan 2012 11:52:31 +0100 Subject: [Bioperl-l] outfile question mark Message-ID: Hi all! I'm trying to parse a a balstxml file using a homemade script. When I run the script the output file has a question mark and I don't know how to avoid it as it's really painful to deal whit a unix command line. Has anyone idea what I'm doing wrong?? That's what I used to try: * print "Waiting for a query name...\n"; my $line = ; open OUTFILE , ">".$line."-UTR.txt" or die "can't open outfile\n"; my $gb = Bio::DB::GenBank->new(); print "Waiting for reference id...\n"; my $gi = ; chomp($gi);* And I get a $file?-UTR.txt file. Thanks a lot! -- Jordi From fs5 at sanger.ac.uk Thu Jan 26 06:24:32 2012 From: fs5 at sanger.ac.uk (Frank Schwach) Date: Thu, 26 Jan 2012 11:24:32 +0000 Subject: [Bioperl-l] outfile question mark In-Reply-To: References: Message-ID: <4F2137F0.2010807@sanger.ac.uk> try chomp $line; after my $line =; at the moment you are creating a filename with a new-line in it Hope that helps Frank On 26/01/12 10:52, Jordi Durban wrote: > Hi all! > I'm trying to parse a a balstxml file using a homemade script. When I run > the script the output file has a question mark and I don't know how to > avoid it as it's really painful to deal whit a unix command line. > Has anyone idea what I'm doing wrong?? > That's what I used to try: > * > print "Waiting for a query name...\n"; > my $line =; > open OUTFILE , ">".$line."-UTR.txt" or die "can't open outfile\n"; > my $gb = Bio::DB::GenBank->new(); > print "Waiting for reference id...\n"; > my $gi =; > chomp($gi);* > > And I get a $file?-UTR.txt file. > Thanks a lot! -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jordi.durban at gmail.com Thu Jan 26 06:34:58 2012 From: jordi.durban at gmail.com (Jordi Durban) Date: Thu, 26 Jan 2012 12:34:58 +0100 Subject: [Bioperl-l] outfile question mark In-Reply-To: <4F2137F0.2010807@sanger.ac.uk> References: <4F2137F0.2010807@sanger.ac.uk> Message-ID: Thanks Frank! I knew the meaning of chomp but I didn't think that it was the reason to include a question mark in the outfile. The bioperl mail list has been useful for an non-wise perl user at least!! Oh! I want to thank Liam too for its so helpul advice: *I think you need to do your own homework...* 2012/1/26 Frank Schwach > try > > chomp $line; > > after my $line =; > at the moment you are creating a filename with a new-line in it > > Hope that helps > > Frank > > > > > On 26/01/12 10:52, Jordi Durban wrote: > >> Hi all! >> I'm trying to parse a a balstxml file using a homemade script. When I run >> the script the output file has a question mark and I don't know how to >> avoid it as it's really painful to deal whit a unix command line. >> Has anyone idea what I'm doing wrong?? >> That's what I used to try: >> * >> print "Waiting for a query name...\n"; >> my $line =; >> open OUTFILE , ">".$line."-UTR.txt" or die "can't open outfile\n"; >> my $gb = Bio::DB::GenBank->new(); >> print "Waiting for reference id...\n"; >> my $gi =; >> chomp($gi);* >> >> >> And I get a $file?-UTR.txt file. >> Thanks a lot! >> > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. ______________________________** > _________________ > > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/**mailman/listinfo/bioperl-l > -- Jordi From numfar85 at gmail.com Wed Jan 25 13:07:46 2012 From: numfar85 at gmail.com (Susanna Trollvad) Date: Wed, 25 Jan 2012 19:07:46 +0100 Subject: [Bioperl-l] Method for finding position of first in frame stop codon Message-ID: Given that I have some starting position within a genome, is there some method in bioperl to find the first stop codon in the same reading frame after that position? //Susanna From florian.lajus at inria.fr Thu Jan 26 07:47:07 2012 From: florian.lajus at inria.fr (lajus) Date: Thu, 26 Jan 2012 13:47:07 +0100 Subject: [Bioperl-l] Question on SeqFeature_RelationShip In-Reply-To: <6BCB94E6-E5FA-4704-BE49-CA99312E4516@drycafe.net> References: <4F0C7017.90803@inria.fr> <4F0D83D8.90402@inria.fr> <4F0D8A18.1080606@inria.fr> <4F0EA69E.8040203@inria.fr> <49CC4A91-5E47-4866-AD63-62DC7F649CE6@drycafe.net> <4F0FF883.80109@inria.fr> <4F0FF8FF.6070706@inria.fr> <4F199A27.8030408@inria.fr> <054FEA92-8C5E-47D6-86AF-F71DEAFE2B63@illinois.edu> <6BCB94E6-E5FA-4704-BE49-CA99312E4516@drycafe.net> Message-ID: <4F214B4B.3040202@inria.fr> Hi all, I have a question concerning the way you can add a seqfeature to an existing seq. To do this, you have to store the immediat parent of seqfeature, that is to say (in your current version) the seq. Then you store this new seqfeature and if a rank is set for this feature, you set proper rank for every other features of this seq. In my version, I want to handle seqfeature relationships. So, I have a ranking problem: Two solutions for ranking: 1 pseudogene 2 mRNA 'A' 3 exon 'a' 4 exon 'b' 5 mRNA 'B' 6 exon 'e' 7 exon 'f' 8 exon 'g' ... or 1 pseudogene 2 mRNA 'A' 1 exon 'a' 2 exon 'b' 3 mRNA 'B' 1 exon 'e' 2 exon 'f' 3 exon 'g' ... The problem with the first solution is: If you add an exon to the first mRNA, you have to modify all the ranking of the next features ... Another problem is that if you add an exon to a retrieven seqfeatures (an mRNA here) you'll have to call the store method not on the immediate parent (the mRNA) but on the seq itself (or you 'll have a bad ranking): Example, if you add 2 exons to first mRNA and call store method on mRNA: 1 Seq 'CR0001' 2 mRNA 'A' 3 exon 'a' 4 exon 'b' 5 exon 'c' 6 exon 'd' 5 mRNA 'B' 6 exon 'e' 7 exon 'f' 8 exon 'g' And the exon 'd' won't be added because of the BIOSQL schema rules. (See problem with the second solution) The problem with the second solution is that seqfeatures (here exon for example) have the same rank, the same source and the same type from the seqfeature table point of view. So you can't insert it cause of the BIOSQL schema rules which don't care of the name attribute. My question is why don't we care ? I hope I am clear... Florian Le 20/01/2012 19:06, Hilmar Lapp a ?crit : > Florian - > > I'll add that aside from being better aligned with our procedures for integrating code contributions, this would also make it easier for us and more recognizable for everyone to attribute these changes to you, because the git commit logs will do this rather than it being buried in a commit message string. So there's an actual benefit for you as the contributor. > > -hilmar > > On Jan 20, 2012, at 11:50 AM, Fields, Christopher J wrote: > >> Florian, >> >> Re: patches, we can accept these, but I would like to point out the code is publicly available on github: >> >> https://github.com/bioperl/bioperl-db >> >> The fastest way to contribute is to create a github account, fork the code from that repository, checkout a local copy using git, then push the changes back to your fork so they are not lost. You can then submit a pull request that should appear on the bioperl developers mailing list, where one of us can simply (via the github interface) merge your changes in. >> >> chris >> >> On Jan 20, 2012, at 10:45 AM, lajus wrote: >> >>> Hi all, >>> >>> As I have said, I've worked on SeqFeatureAdaptor to also persist and retrieve sub-features. >>> My code is in attachment: >>> >>> I have modified the last stable version of : >>> - Bio/DB/BioSQL/SeqFeatureAdaptor.pm >>> - Bio/DB/BIOSQL/SeqAdaptor.pm >>> I have created: >>> - Bio/SeqFeature/SeqFeatureRealtionship.pm >>> - Bio/DB/BioSQL/SeqFeatureRealtionshipAdaptor.pm >>> some tests: >>> - SeqFeatureRealtionship.t (very simple test for the also rally simple SeqFeatureRealtionship class) >>> - SeqFeatureRealtionshipAdaptor.t (test persistent of subfeatures object in database. Need a database with BioSQL to connect with (even if no commit is done)) => I have only tested with a PostGres database ... >>> >>> If you have advices, questions about my implementation or about my tests, don't hesitate to tell me. >>> >>> Is there a way to include my modifications to a future release of BioPerl ? >>> >>> Florian >>> >>> Le 14/01/2012 17:12, Hilmar Lapp a ?crit : >>>> Hi Florian, >>>> >>>> You could do that (and it might have advantages in terms of code separation), but you don't have to. In general, adaptor classes get instantiated by the Bioperl-DB framework when a Bioperl class that is mapped to it needs to get serialized or populated. Since there is no class in Bioperl that would correspond to a seqfeature relationship, those situations won't occur. >>>> >>>> So you could just keep it simple and expand store_children() and correspondingly their retrieval in the adaptor class for seqfeatures. But as hinted above, you may still prefer a separate adaptor class just to keep the nitty-gritty of storing/loading the relationships out of the main adaptor class. Really up to you how you feel more comfortable. >>>> >>>> -hilmar >>>> >>>> Sent with a tap. >>>> >>>> On Jan 13, 2012, at 4:27 AM, lajus wrote: >>>> >>>>> I should write >>>>> - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB/BioSQL >>>>> of course >>>>> >>>>> Le 13/01/2012 10:25, lajus a ?crit : >>>>>> Hi hilmar, >>>>>> >>>>>> Thanks for your hint, but I'm quite lost in the BioPerl architecture (and quite new in perl programming). I'd like to use the handling of term-to-term relationships as a template but I don't find what files are related to this. >>>>>> >>>>>> As far as I understand, I should create: >>>>>> - a new adaptor called SeqFeatureRelationshipAdaptor in Bio/DB >>>>>> - a new object SeqFeatureRelationship (and its interface) in Bio/Seqfeature >>>>>> - modify SeqFeatureAdaptor to store children (just with a call to subSeqFeature in store_children sub and thanks to my SeqFeatureRelationshipAdaptor create new relationships) >>>>>> - modify SeqFeatureAdaptor to retrieve children ( thanks to my SeqFeatureRelationshipAdaptor create new relationships ) >>>>>> >>>>>> Is it the right way? >>>>>> >>>>>> Florian >>>>>> >>>>>> Le 12/01/2012 18:49, Hilmar Lapp a ?crit : >>>>>>> Hi Florian, >>>>>>> >>>>>>> Thanks for digging this up - this is what I had in memory, but I ran out of time last night in ascertaining that it is indeed still true. >>>>>>> >>>>>>> It'd be awesome if you can add the code to SeqFeatureAdaptor to also persist and retrieve sub-features. I think the object-relational mappings are all there already (in BaseDriver.pm). You could use the handling of bioentry-to-bioentry relationships (or term-to-term relationships) as a template for how to implement this. >>>>>>> >>>>>>> -hilmar >>>>>>> >>>>>>> On Jan 12, 2012, at 4:23 AM, lajus wrote: >>>>>>> >>>>>>>> Ok, I have looked in BioPerl code and it appears that subSeqFeature are not handled yet: >>>>>>>> comment in SeqFeatureAdaptor.pm for store children function (and attach childrenn too): >>>>>>>> "Bio::SeqFeatureI has a location, annotation, and possibly sub-seqfeatures as children. The latter is not implemented yet." >>>>>>>> >>>>>>>> So it's totally normal, if it doesn't work. >>>>>>>> Have you started to implement this stuff, or should I rewrite another SeqFeatureAdaptor which handle this ? >>>>>>>> >>>>>>>> Florian >>>>>>>> >>>>>>>> Le 11/01/2012 16:44, Fields, Christopher J a ?crit : >>>>>>>>> Seems like a possible bug with bioperl-db, I believe hierarchal seqfeatures are stored, but it's worth looking into. Do you have some example data (genbank file you are using, for instance)? >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Jan 11, 2012, at 7:09 AM, lajus wrote: >>>>>>>>> >>>>>>>>>> Therefore, if I look in verbose mode, I can see that in the stack I have many : >>>>>>>>>> >>>>>>>>>> no adaptor found for class Bio::Annotation::TypeManager >>>>>>>>>> no adaptor found for class Bio::DB::Persistent::PersistentObjectFactory >>>>>>>>>> >>>>>>>>>> Just warning, no errors but... >>>>>>>>>> Any clues? >>>>>>>>>> >>>>>>>>>> Thanks by advance, >>>>>>>>>> >>>>>>>>>> Florian >>>>>>>>>> >>>>>>>>>> Le 11/01/2012 13:43, lajus a ?crit : >>>>>>>>>>> I have looked to the Unflattener and the magic works quite fine. >>>>>>>>>>> Then, the $seq which is given (by side-effect) by >>>>>>>>>>> $unflattener->unflatten_seq(-seq=>$seq, -use_magic=>1); >>>>>>>>>>> has a good hierarchy for us. >>>>>>>>>>> So I'm asking why can't I store this Bio::Seq in my database? Now there is an explicit parent/child links between the gene and CDS. >>>>>>>>>>> But when I create a persitent object for $seq and if I create it: >>>>>>>>>>> $adaptor->create_persistent($seq); >>>>>>>>>>> $pseq->create(); >>>>>>>>>>> In my database, the bioentry and subseqFeatures are written but still no relation in the seqFeature_relationship table. >>>>>>>>>>> >>>>>>>>>>> Do you have an explanation? >>>>>>>>>>> >>>>>>>>>>> Florian >>>>>>>>>>> >>>>>>>>>>> Le 10/01/2012 19:45, Fields, Christopher J a ?crit : >>>>>>>>>>>> On Jan 10, 2012, at 12:18 PM, Peter Cock wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jan 10, 2012 at 5:06 PM, lajus wrote: >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> I am currently working on a refactoring of the Genolevures project >>>>>>>>>>>>>> (http://www.genolevures.org/) >>>>>>>>>>>>>> We are trying to better use bioperl and the bioSQL shema on a postgreSQL >>>>>>>>>>>>>> database. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have loaded an EMBL file into my BioSQL database (postgres). If I look in >>>>>>>>>>>>>> my database, my bioentry have been added and seqFeatures associated too. >>>>>>>>>>>>>> But it seems that my seqfeature_relationship table is empty. >>>>>>>>>>>>>> I find it strange in so far as there is a relationship between gene and its >>>>>>>>>>>>>> CDS. right? >>>>>>>>>>>>> No, not explicitly. Unlike GFF3 where there can be (and should be) >>>>>>>>>>>>> explicit parent/child links between the gene and CDS, in GenBank >>>>>>>>>>>>> and EMBL feature tables this is implicit only. I don't know if BioPerl >>>>>>>>>>>>> attempts to infer this kind of relationship, and if it did, if that would >>>>>>>>>>>>> get record in the BioSQL tables. >>>>>>>>>>>>> >>>>>>>>>>>>> Peter >>>>>>>>>>>> BioPerl does not attempt to infer these by default (too much magic, and too many potential issues), but one can use something like the Unflattener, which does have some magic built-in: >>>>>>>>>>>> >>>>>>>>>>>> https://metacpan.org/module/Bio::SeqFeature::Tools::Unflattener >>>>>>>>>>>> >>>>>>>>>>>> chris >>>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> From adlai at refenestration.com Fri Jan 27 06:27:15 2012 From: adlai at refenestration.com (Adlai Burman) Date: Fri, 27 Jan 2012 12:27:15 +0100 Subject: [Bioperl-l] Lineage from GB files Message-ID: <6CBCE402-E3C8-4FE0-BA57-4A7B82C3CF5F@refenestration.com> Does anyone know if there is a way to batch extract taxa such as class, order in Perl from, e/g/ genbank, EMBL records? I know that genus/species and some of the higher taxa are easy to parse from gb records but the interior are inconsistent strings (e.g. element x sometimes is a subclass and sometimes a family. Any help would really be appreciated. Thanks. Adlai From sb752000 at gmail.com Fri Jan 27 15:58:33 2012 From: sb752000 at gmail.com (Steve Barron) Date: Fri, 27 Jan 2012 15:58:33 -0500 Subject: [Bioperl-l] Build install Message-ID: Dear Bioperl, I made it to the step ./Build install, but then I received this error: ERROR: Can't create '/Library/Perl/5.12/Bio' mkdir /Library/Perl/5.12/Bio: Permission denied at /System/Library/Perl/5.12/ExtUtils/Install.pm line 494 Do you know what the problem is? Thank you Steve From Kevin.M.Brown at asu.edu Fri Jan 27 16:17:57 2012 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Fri, 27 Jan 2012 14:17:57 -0700 Subject: [Bioperl-l] Build install In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B4081A1958@EX02.asurite.ad.asu.edu> As the error says, it can't create the directory as you don't have permission to make changes to /Library/Perl/5.12/. Judging by that path I'm guessing you're on a Mac. In which case you want to do something like 'sudo ./Build install' or read the directions on installing BioPerl in your own specified directory. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Steve Barron Sent: Friday, January 27, 2012 1:59 PM To: bioperl-l at bioperl.org Subject: [Bioperl-l] Build install Dear Bioperl, I made it to the step ./Build install, but then I received this error: ERROR: Can't create '/Library/Perl/5.12/Bio' mkdir /Library/Perl/5.12/Bio: Permission denied at /System/Library/Perl/5.12/ExtUtils/Install.pm line 494 Do you know what the problem is? Thank you Steve _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From joel.klein at wur.nl Mon Jan 30 06:21:09 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Mon, 30 Jan 2012 03:21:09 -0800 (PST) Subject: [Bioperl-l] Running into problems Message-ID: <33228400.post@talk.nabble.com> HI all, I'm quite new to bioperl and tried to write a script that creates a database from a newly sequenced genome and then preforms a tblastn against a multiple protein fasta file and then creates a blast report were only the results that only preservers identity scores above 98%. However my script keeps returning numerous errors and problems and since I have only a little experience I cannot determine were I went wrong. I include the code that I got so far in the attachment. Hope someone can help. Regards Joel http://old.nabble.com/file/p33228400/blast1.pl blast1.pl -- View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33228400.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From scott at scottcain.net Mon Jan 30 14:54:03 2012 From: scott at scottcain.net (Scott Cain) Date: Mon, 30 Jan 2012 14:54:03 -0500 Subject: [Bioperl-l] Running into problems In-Reply-To: <33228400.post@talk.nabble.com> References: <33228400.post@talk.nabble.com> Message-ID: Hi Joel, I don't have blast+ installed, so I can't run your script. You will probably get more help if you copy the errors you are getting into your email. Scott On Mon, Jan 30, 2012 at 6:21 AM, Bradyjoel wrote: > > HI all, > > I'm quite new to bioperl and tried to write a script that creates a database > from a newly sequenced genome and then preforms a tblastn against a multiple > protein fasta file and then creates a blast report were only the results > that only preservers identity scores above 98%. However my script keeps > returning numerous errors and problems and since I have only a little > experience I cannot determine were I went wrong. I include the code that I > got so far in the attachment. Hope someone can help. > Regards Joel > > http://old.nabble.com/file/p33228400/blast1.pl blast1.pl > -- > View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33228400.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From vivekkrishnakumar at gmail.com Mon Jan 30 15:34:45 2012 From: vivekkrishnakumar at gmail.com (Vivek Krishnakumar) Date: Mon, 30 Jan 2012 12:34:45 -0800 (PST) Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend Message-ID: <14987092.1710.1327955685430.JavaMail.geo-discussion-forums@vbuf18> Hello, I have the following code snippet which is supposed to retrieve a certain gene "locus" feature from the backend database and use that to get the 'start' and 'end' coordinate of the feature based on the 'strand' on which it is present. my ($locus_obj, $gene_models) = get_annotation_db_features($locus, $gff_dbh); sub get_annotation_db_features { my ($locus, $gff_dbh) = @_; my ($locus_obj) = $gff_dbh->get_feature_by_name('Gene' => '$locus'); my ($end5, $end3) = $locus_obj->strand == 1 ? ($locus_obj->start, $locus_obj->end) : ($locus_obj->end, $locus_obj->start); my $segment = $gff_dbh->segment($locus_obj->refseq, $end5, $end3); my @gene_models = $segment->features('processed_transcript:working_models', -attributes => { 'Gene' => $locus }); #will have to sort the gene models return ($locus_obj, \@gene_models); } Also, here is a snippet of the GFF3 file that is used to populate the backend database: ##gff-version 3 chr2 working_models gene 30427563 30429139 . - . ID=gene_35804;Note=Zinc transporter;Name=Medtr2g097580 chr2 working_models mRNA 30427563 30429139 . - . ID=mrna_36255;Parent=gene_35804;Name=Medtr2g097580.1;conf_class=F chr2 working_models exon 30428491 30429139 . - . ID=exon_120028;Parent=mrna_36255 chr2 working_models exon 30427563 30428147 . - . ID=exon_120029;Parent=mrna_36255 chr2 working_models CDS 30428491 30429109 . - 0 ID=cds_120028;Parent=mrna_36255 chr2 working_models CDS 30427756 30428147 . - 2 ID=cds_120029;Parent=mrna_36255 Considering that this gene locus is unique in the entire GFF file, If this above GFF is loaded into the SeqFeature::Store database, you would expect that running the following query should yield a count of "1": SELECT count(f.id) FROM feature as f, name as n WHERE (n.id =f.id AND n.name = 'Medtr2g097580' AND n.display_name > 0); And in my case, it does yield "1". But if I use the function *get_feature_by_name*, retrieve the locus object and try to get the strand of the locus, I get the following error: [error] Can't call method "strand" on an undefined value at get_db_features.pl line 910 As I specified earlier, I do not have any problems if the backend is Bio::DB::GFF. As we know that *get_feature_by_name* is in place for backward compatibility, I even tried modifying the code snippet to call '*get_feature s_by_name*' instead and then *shift* out the first locus object from the list of matching features and use that, but no matter what, I do not get any locus object back from this subroutine! Could someone please guide me in the right direction and let me know if I am making any mistakes here when migrating from GFF2 to GFF3? Thanks in advance. ~ Vivek From Russell.Smithies at agresearch.co.nz Mon Jan 30 15:38:43 2012 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue, 31 Jan 2012 09:38:43 +1300 Subject: [Bioperl-l] Running into problems In-Reply-To: <33228400.post@talk.nabble.com> References: <33228400.post@talk.nabble.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> I'd probably cheat a bit and optimise my blast parameters so there's less output to process. Also, are you sure an e-value of 100 is what you're after? I'd be aiming much lower - probably 1e-6. It also pays to mask repeats if you're blasting against a whole genome to cut down on the number of rubbish hits. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bradyjoel > Sent: Tuesday, 31 January 2012 12:21 a.m. > To: Bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Running into problems > > > HI all, > > I'm quite new to bioperl and tried to write a script that creates a database > from a newly sequenced genome and then preforms a tblastn against a > multiple protein fasta file and then creates a blast report were only the > results that only preservers identity scores above 98%. However my script > keeps returning numerous errors and problems and since I have only a little > experience I cannot determine were I went wrong. I include the code that I > got so far in the attachment. Hope someone can help. > Regards Joel > > http://old.nabble.com/file/p33228400/blast1.pl blast1.pl > -- > View this message in context: http://old.nabble.com/Running-into- > problems-tp33228400p33228400.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From vivekkrishnakumar at gmail.com Mon Jan 30 15:57:05 2012 From: vivekkrishnakumar at gmail.com (Vivek Krishnakumar) Date: Mon, 30 Jan 2012 12:57:05 -0800 (PST) Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: <14987092.1710.1327955685430.JavaMail.geo-discussion-forums@vbuf18> References: <14987092.1710.1327955685430.JavaMail.geo-discussion-forums@vbuf18> Message-ID: <7498961.1574.1327957025117.JavaMail.geo-discussion-forums@vbiq27> Just wanted to rewrite the SQL query (for some reason it shows up with embedded URLs, since I copy pasted it from another email). Here it is: SELECT count(f.id) FROM feature as f, name as n WHERE (n.id= f.id AND n.name = 'Medtr2g097580' AND n.display_name > 0); From vivekkrishnakumar at gmail.com Mon Jan 30 16:05:00 2012 From: vivekkrishnakumar at gmail.com (Vivek Krishnakumar) Date: Mon, 30 Jan 2012 13:05:00 -0800 (PST) Subject: [Bioperl-l] `get_feature_by_name` not working after migrating to Bio::DB::SeqFeature::Store from a Bio::DB::GFF backend In-Reply-To: <14987092.1710.1327955685430.JavaMail.geo-discussion-forums@vbuf18> References: <14987092.1710.1327955685430.JavaMail.geo-discussion-forums@vbuf18> Message-ID: <93062.1739.1327957500740.JavaMail.geo-discussion-forums@vbcw9> Just wanted to rewrite the SQL query (for some reason it shows up with embedded URLs, since I copy pasted it from another email). Here it is: SELECT count(f.id) FROM feature as f, name as n WHERE (n.id= f.id AND n.name = 'Medtr2g097580' AND n.display_name > 0); Thank you! Vivek From joel.klein at wur.nl Tue Jan 31 09:50:29 2012 From: joel.klein at wur.nl (Bradyjoel) Date: Tue, 31 Jan 2012 06:50:29 -0800 (PST) Subject: [Bioperl-l] Running into problems In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> Message-ID: <33236700.post@talk.nabble.com> I update my script with your recommendations however it still gives me the following errors when I run it with the following command: $ perl blast1.pl "my" variable $writer masks earlier declaration in same scope at blast1.pl line 50. Global symbol "$fac" requires explicit package name at blast1.pl line 15. Global symbol "$seqio_obj" requires explicit package name at blast1.pl line 24. Global symbol "$seq_obj" requires explicit package name at blast1.pl line 25. Global symbol "$seqio_obj" requires explicit package name at blast1.pl line 25. Global symbol "$result" requires explicit package name at blast1.pl line 29. Global symbol "$fac" requires explicit package name at blast1.pl line 29. Global symbol "$file" requires explicit package name at blast1.pl line 33. Global symbol "$fac" requires explicit package name at blast1.pl line 33. Global symbol "$fac" requires explicit package name at blast1.pl line 35. Execution of blast1.pl aborted due to compilation errors. Apparently, I did something wrong but I dont know how to resolve this, any suggestion? Thanks for your help so far. Joel Smithies, Russell wrote: > > I'd probably cheat a bit and optimise my blast parameters so there's less > output to process. > Also, are you sure an e-value of 100 is what you're after? I'd be aiming > much lower - probably 1e-6. > It also pays to mask repeats if you're blasting against a whole genome to > cut down on the number of rubbish hits. > > --Russell > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Bradyjoel >> Sent: Tuesday, 31 January 2012 12:21 a.m. >> To: Bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Running into problems >> >> >> HI all, >> >> I'm quite new to bioperl and tried to write a script that creates a >> database >> from a newly sequenced genome and then preforms a tblastn against a >> multiple protein fasta file and then creates a blast report were only the >> results that only preservers identity scores above 98%. However my script >> keeps returning numerous errors and problems and since I have only a >> little >> experience I cannot determine were I went wrong. I include the code that >> I >> got so far in the attachment. Hope someone can help. >> Regards Joel >> >> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl >> -- >> View this message in context: http://old.nabble.com/Running-into- >> problems-tp33228400p33228400.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33236700.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From scott at scottcain.net Tue Jan 31 10:24:20 2012 From: scott at scottcain.net (Scott Cain) Date: Tue, 31 Jan 2012 10:24:20 -0500 Subject: [Bioperl-l] Running into problems In-Reply-To: <33236700.post@talk.nabble.com> References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> <33236700.post@talk.nabble.com> Message-ID: Hi Joel, This is a fundamental perl thing: when you "use strict" (which is a very good thing to do to help you catch other, more difficult to detect problems), you need to identify all of the variables you use either as coming from a package you used, or when you first declare them in your script, with "my" in front of it. That is what all of the "Global symbol" things are about. I'd suggest getting the "Learning Perl" book, which is quite good for beginners. Scott On Tue, Jan 31, 2012 at 9:50 AM, Bradyjoel wrote: > > I update my script with your recommendations however it still gives me the > following errors when I run it with the following command: > > $ perl blast1.pl > "my" variable $writer masks earlier declaration in same scope at blast1.pl > line 50. > Global symbol "$fac" requires explicit package name at blast1.pl line 15. > Global symbol "$seqio_obj" requires explicit package name at blast1.pl line > 24. > Global symbol "$seq_obj" requires explicit package name at blast1.pl line > 25. > Global symbol "$seqio_obj" requires explicit package name at blast1.pl line > 25. > Global symbol "$result" requires explicit package name at blast1.pl line 29. > Global symbol "$fac" requires explicit package name at blast1.pl line 29. > Global symbol "$file" requires explicit package name at blast1.pl line 33. > Global symbol "$fac" requires explicit package name at blast1.pl line 33. > Global symbol "$fac" requires explicit package name at blast1.pl line 35. > Execution of blast1.pl aborted due to compilation errors. > > Apparently, I did something wrong but I dont know how to resolve this, any > suggestion? > Thanks for your help so far. > > > Joel > > > Smithies, Russell wrote: >> >> I'd probably cheat a bit and optimise my blast parameters so there's less >> output to process. >> Also, are you sure an e-value of 100 is what you're after? I'd be aiming >> much lower - probably 1e-6. >> It also pays to mask repeats if you're blasting against a whole genome to >> cut down on the number of rubbish hits. >> >> --Russell >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Bradyjoel >>> Sent: Tuesday, 31 January 2012 12:21 a.m. >>> To: Bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Running into problems >>> >>> >>> HI all, >>> >>> I'm quite new to bioperl and tried to write a script that creates a >>> database >>> from a newly sequenced genome and then preforms a tblastn against a >>> multiple protein fasta file and then creates a blast report were only the >>> results that only preservers identity scores above 98%. However my script >>> keeps returning numerous errors and problems and since I have only a >>> little >>> experience I cannot determine were I went wrong. I include the code that >>> I >>> got so far in the attachment. Hope someone can help. >>> Regards Joel >>> >>> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl >>> -- >>> View this message in context: http://old.nabble.com/Running-into- >>> problems-tp33228400p33228400.html >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> ======================================================================= >> Attention: The information contained in this message and/or attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or privileged >> material. Any review, retransmission, dissemination or other use of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by AgResearch >> Limited. If you have received this message in error, please notify the >> sender immediately. >> ======================================================================= >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33236700.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?? scott at scottcain dot net GMOD Coordinator (http://gmod.org/)? ? ? ? ? ? ? ? ? ?? 216-392-3087 Ontario Institute for Cancer Research From rutgeraldo at gmail.com Tue Jan 31 10:27:53 2012 From: rutgeraldo at gmail.com (Rutger Vos) Date: Tue, 31 Jan 2012 16:27:53 +0100 Subject: [Bioperl-l] Running into problems In-Reply-To: References: <33228400.post@talk.nabble.com> <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz> <33236700.post@talk.nabble.com> Message-ID: Here's a pretty good page that describes the topic of scoping: http://perl.plover.com/FAQs/Namespaces.html On Tue, Jan 31, 2012 at 4:24 PM, Scott Cain wrote: > Hi Joel, > > This is a fundamental perl thing: when you "use strict" (which is a > very good thing to do to help you catch other, more difficult to > detect problems), you need to identify all of the variables you use > either as coming from a package you used, or when you first declare > them in your script, with "my" in front of it. That is what all of > the "Global symbol" things are about. I'd suggest getting the > "Learning Perl" book, which is quite good for beginners. > > Scott > > > On Tue, Jan 31, 2012 at 9:50 AM, Bradyjoel wrote: > > > > I update my script with your recommendations however it still gives me > the > > following errors when I run it with the following command: > > > > $ perl blast1.pl > > "my" variable $writer masks earlier declaration in same scope at > blast1.pl > > line 50. > > Global symbol "$fac" requires explicit package name at blast1.pl line > 15. > > Global symbol "$seqio_obj" requires explicit package name at blast1.plline > > 24. > > Global symbol "$seq_obj" requires explicit package name at blast1.plline > > 25. > > Global symbol "$seqio_obj" requires explicit package name at blast1.plline > > 25. > > Global symbol "$result" requires explicit package name at blast1.plline 29. > > Global symbol "$fac" requires explicit package name at blast1.pl line > 29. > > Global symbol "$file" requires explicit package name at blast1.pl line > 33. > > Global symbol "$fac" requires explicit package name at blast1.pl line > 33. > > Global symbol "$fac" requires explicit package name at blast1.pl line > 35. > > Execution of blast1.pl aborted due to compilation errors. > > > > Apparently, I did something wrong but I dont know how to resolve this, > any > > suggestion? > > Thanks for your help so far. > > > > > > Joel > > > > > > Smithies, Russell wrote: > >> > >> I'd probably cheat a bit and optimise my blast parameters so there's > less > >> output to process. > >> Also, are you sure an e-value of 100 is what you're after? I'd be aiming > >> much lower - probably 1e-6. > >> It also pays to mask repeats if you're blasting against a whole genome > to > >> cut down on the number of rubbish hits. > >> > >> --Russell > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>> bounces at lists.open-bio.org] On Behalf Of Bradyjoel > >>> Sent: Tuesday, 31 January 2012 12:21 a.m. > >>> To: Bioperl-l at lists.open-bio.org > >>> Subject: [Bioperl-l] Running into problems > >>> > >>> > >>> HI all, > >>> > >>> I'm quite new to bioperl and tried to write a script that creates a > >>> database > >>> from a newly sequenced genome and then preforms a tblastn against a > >>> multiple protein fasta file and then creates a blast report were only > the > >>> results that only preservers identity scores above 98%. However my > script > >>> keeps returning numerous errors and problems and since I have only a > >>> little > >>> experience I cannot determine were I went wrong. I include the code > that > >>> I > >>> got so far in the attachment. Hope someone can help. > >>> Regards Joel > >>> > >>> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl > >>> -- > >>> View this message in context: http://old.nabble.com/Running-into- > >>> problems-tp33228400p33228400.html > >>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> ======================================================================= > >> Attention: The information contained in this message and/or attachments > >> from AgResearch Limited is intended only for the persons or entities > >> to which it is addressed and may contain confidential and/or privileged > >> material. Any review, retransmission, dissemination or other use of, or > >> taking of any action in reliance upon, this information by persons or > >> entities other than the intended recipients is prohibited by AgResearch > >> Limited. If you have received this message in error, please notify the > >> sender immediately. > >> ======================================================================= > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > -- > > View this message in context: > http://old.nabble.com/Running-into-problems-tp33228400p33236700.html > > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. scott at scottcain > dot net > GMOD Coordinator (http://gmod.org/) 216-392-3087 > Ontario Institute for Cancer Research > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Dr. Rutger A. Vos Bioinformaticist NCB Naturalis Visiting address: Einsteinweg 2, 2333 CC, Leiden, the Netherlands Mailing address: Postbus 9517, 2300 RA, Leiden, the Netherlands http://rutgervos.blogspot.com From jovel_juan at hotmail.com Tue Jan 31 10:54:33 2012 From: jovel_juan at hotmail.com (Juan Jovel) Date: Tue, 31 Jan 2012 15:54:33 +0000 Subject: [Bioperl-l] Running into problems In-Reply-To: <33236700.post@talk.nabble.com> References: <33228400.post@talk.nabble.com>, <18DF7D20DFEC044098A1062202F5FFF34BD33ADBAB@exchsth.agresearch.co.nz>, <33236700.post@talk.nabble.com> Message-ID: Those are not BioPerl problems, but basic Perl errors. I suggest you consolidate your Perl background before moving into BioPerl, it would make things much easier (try to understand how modules work as well, so you will be able to profit from BioPerl modules at maximum). > Date: Tue, 31 Jan 2012 06:50:29 -0800 > From: joel.klein at wur.nl > To: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Running into problems > > > I update my script with your recommendations however it still gives me the > following errors when I run it with the following command: > > $ perl blast1.pl > "my" variable $writer masks earlier declaration in same scope at blast1.pl > line 50. > Global symbol "$fac" requires explicit package name at blast1.pl line 15. > Global symbol "$seqio_obj" requires explicit package name at blast1.pl line > 24. > Global symbol "$seq_obj" requires explicit package name at blast1.pl line > 25. > Global symbol "$seqio_obj" requires explicit package name at blast1.pl line > 25. > Global symbol "$result" requires explicit package name at blast1.pl line 29. > Global symbol "$fac" requires explicit package name at blast1.pl line 29. > Global symbol "$file" requires explicit package name at blast1.pl line 33. > Global symbol "$fac" requires explicit package name at blast1.pl line 33. > Global symbol "$fac" requires explicit package name at blast1.pl line 35. > Execution of blast1.pl aborted due to compilation errors. > > Apparently, I did something wrong but I dont know how to resolve this, any > suggestion? > Thanks for your help so far. > > > Joel > > > Smithies, Russell wrote: > > > > I'd probably cheat a bit and optimise my blast parameters so there's less > > output to process. > > Also, are you sure an e-value of 100 is what you're after? I'd be aiming > > much lower - probably 1e-6. > > It also pays to mask repeats if you're blasting against a whole genome to > > cut down on the number of rubbish hits. > > > > --Russell > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Bradyjoel > >> Sent: Tuesday, 31 January 2012 12:21 a.m. > >> To: Bioperl-l at lists.open-bio.org > >> Subject: [Bioperl-l] Running into problems > >> > >> > >> HI all, > >> > >> I'm quite new to bioperl and tried to write a script that creates a > >> database > >> from a newly sequenced genome and then preforms a tblastn against a > >> multiple protein fasta file and then creates a blast report were only the > >> results that only preservers identity scores above 98%. However my script > >> keeps returning numerous errors and problems and since I have only a > >> little > >> experience I cannot determine were I went wrong. I include the code that > >> I > >> got so far in the attachment. Hope someone can help. > >> Regards Joel > >> > >> http://old.nabble.com/file/p33228400/blast1.pl blast1.pl > >> -- > >> View this message in context: http://old.nabble.com/Running-into- > >> problems-tp33228400p33228400.html > >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > ======================================================================= > > Attention: The information contained in this message and/or attachments > > from AgResearch Limited is intended only for the persons or entities > > to which it is addressed and may contain confidential and/or privileged > > material. Any review, retransmission, dissemination or other use of, or > > taking of any action in reliance upon, this information by persons or > > entities other than the intended recipients is prohibited by AgResearch > > Limited. If you have received this message in error, please notify the > > sender immediately. > > ======================================================================= > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > View this message in context: http://old.nabble.com/Running-into-problems-tp33228400p33236700.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ss2489 at cornell.edu Tue Jan 31 16:49:37 2012 From: ss2489 at cornell.edu (Surya Saha) Date: Tue, 31 Jan 2012 16:49:37 -0500 Subject: [Bioperl-l] Lineage from GB files In-Reply-To: <6CBCE402-E3C8-4FE0-BA57-4A7B82C3CF5F@refenestration.com> References: <6CBCE402-E3C8-4FE0-BA57-4A7B82C3CF5F@refenestration.com> Message-ID: Hi Adlai, It really depends on what items are present the Genbank/EMBL. You can use the NCBI Taxonomy database and Taxonomy modules in CPAN to identify the taxonomic hierarchy of an accession, for e.g., you can map the GI to Taxonomy ID and extract the taxonomy using Bio::LITE::Taxonomy::NCBI. Here's a script (not authored by me) on Github that might get you started. -Surya On Fri, Jan 27, 2012 at 6:27 AM, Adlai Burman wrote: > Does anyone know if there is a way to batch extract taxa such as class, > order in Perl from, e/g/ genbank, EMBL records? I know that genus/species > and some of the higher taxa are easy to parse from gb records but the > interior are inconsistent strings (e.g. element x sometimes is a subclass > and sometimes a family. > Any help would really be appreciated. > > Thanks. > Adlai > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Tue Jan 31 17:02:18 2012 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 31 Jan 2012 16:02:18 -0600 Subject: [Bioperl-l] Lineage from GB files In-Reply-To: References: <6CBCE402-E3C8-4FE0-BA57-4A7B82C3CF5F@refenestration.com> Message-ID: <4F2864EA.9090607@illinois.edu> Funny, the script you link to is hyphaltip's, a.k.a. Jason Stajich :) chris On 01/31/2012 03:49 PM, Surya Saha wrote: > Hi Adlai, > > It really depends on what items are present the Genbank/EMBL. You can use > the NCBI Taxonomy database and Taxonomy > modules in > CPAN to identify the taxonomic hierarchy of an accession, for e.g., you can > map the GI to Taxonomy ID and extract the taxonomy > using Bio::LITE::Taxonomy::NCBI. > > Here's a script > (not > authored by me) on Github that might get you started. > > -Surya > > > On Fri, Jan 27, 2012 at 6:27 AM, Adlai Burmanwrote: > >> Does anyone know if there is a way to batch extract taxa such as class, >> order in Perl from, e/g/ genbank, EMBL records? I know that genus/species >> and some of the higher taxa are easy to parse from gb records but the >> interior are inconsistent strings (e.g. element x sometimes is a subclass >> and sometimes a family. >> Any help would really be appreciated. >> >> Thanks. >> Adlai >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ss2489 at cornell.edu Tue Jan 31 17:07:47 2012 From: ss2489 at cornell.edu (Surya Saha) Date: Tue, 31 Jan 2012 17:07:47 -0500 Subject: [Bioperl-l] Lineage from GB files In-Reply-To: <4F2864EA.9090607@illinois.edu> References: <6CBCE402-E3C8-4FE0-BA57-4A7B82C3CF5F@refenestration.com> <4F2864EA.9090607@illinois.edu> Message-ID: Small world :-) On Tue, Jan 31, 2012 at 5:02 PM, Chris Fields wrote: > Funny, the script you link to is hyphaltip's, a.k.a. Jason Stajich :) > > chris > > > On 01/31/2012 03:49 PM, Surya Saha wrote: > >> Hi Adlai, >> >> It really depends on what items are present the Genbank/EMBL. You can use >> the NCBI Taxonomy database and Taxonomy >> modules> >> in >> >> CPAN to identify the taxonomic hierarchy of an accession, for e.g., you >> can >> map the GI to Taxonomy ID and extract the taxonomy >> using Bio::LITE::Taxonomy::NCBI. >> >> Here's a script> master/scripts/taxonomy.pl >> > >> >> (not >> authored by me) on Github that might get you started. >> >> -Surya >> >> >> On Fri, Jan 27, 2012 at 6:27 AM, Adlai Burman >> >wrote: >> >> Does anyone know if there is a way to batch extract taxa such as class, >>> order in Perl from, e/g/ genbank, EMBL records? I know that genus/species >>> and some of the higher taxa are easy to parse from gb records but the >>> interior are inconsistent strings (e.g. element x sometimes is a subclass >>> and sometimes a family. >>> Any help would really be appreciated. >>> >>> Thanks. >>> Adlai >>> ______________________________**_________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >>> >>> ______________________________**_________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/**mailman/listinfo/bioperl-l >> > > > >