[Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches

Smithies, Russell Russell.Smithies at agresearch.co.nz
Sun Jul 15 17:19:14 EDT 2012


Hi Jochen,
I don't think BioPerl can directly manipulate blast databases so I'd probably do it with fastacmd to extract the sequence from the original blast database.
eg.
fastacmd -s X51494.1 -d /dataset/blastdata/active/nt -L 100,200
>gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4)
ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG
CATGCAGCTACAAGGCATGAT

Or if you're using blast+, use the blastdbcmd command:
eg.
blastdbcmd -entry  X51494.1 -db /dataset/blastdata/active/nt -range 100-200
>gi|20090|emb|X51494.1|:100-200 Rice prolamin gene (strain NE4)
ATGATGCAAACGTTGGGCATGGGTAGCTCCACAGCCATGTTCATGTCGCAGCCAATGGCGCTCCTGCAGCAGCAATGTTG
CATGCAGCTACAAGGCATGAT

So to add it all together, try using BioPerl to parse your existing blast results and pull out each hit's coordinates then use a system call to exec fastacmd or blastdbcmd to extract the sequence from the blast database then write the sequences to file.

These might be useful:
http://www.bioperl.org/wiki/HOWTO:SearchIO
http://www.bioperl.org/wiki/HOWTO:SearchIO#Speed_improvements_with_lightweight_objects 
http://www.bioperl.org/wiki/HOWTO:BlastPlus
http://www.bioperl.org/wiki/HOWTO:StandAloneBlast


--Russell

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of jobu
Sent: Monday, 16 July 2012 7:47 a.m.
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to obtain Up- and Downstream target-Sequences of Blast Matches

Dear All.

Still being a beginner in Perl and just having started to look into BioPerl, I hope to ask my question at the right place.

I locally ran a standalone blastn search of many short query-sequences against a set of target-fasta-sequences consisting of whole chromosomal sequence data.

What I need to do now is to get let's say 100nt each Up- and Downstream out of my target sequences for each Blast match.

At this point I only can assume that BioPerl might be helpfull in resolving this task, though I haven't found a module yet that will manage to do this locally on my harddrive.

Thus I would be thankful for the slightest hint where to begin.

Sincerely
Jochen
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the Bioperl-l mailing list