[Bioperl-l] use blast to extract similar sequences
Fields, Christopher J
cjfields at illinois.edu
Fri Sep 16 09:22:07 EDT 2011
That seems like a pretty straightforward thing to do; there isn't an all-in-one way of doing this, but that's a good thing (it's a separation of concerns).
1) Run and parse BLAST results and grab seqID and coordinates for each hit (or each HSP for each hit) (Bio::SearchIO)
2) Pull the right subsequence +/- 20bp using above from the indexed flat file of your reference (Bio::DB::Fasta)
You can get revcomped sequence from Bio::DB::Fasta directly by flipping coordinates:
# raw sequence
my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
On Sep 16, 2011, at 3:51 AM, Ross KK Leung wrote:
> I wonder whether bioperl has any built-in modules that extracts sequences based on blast results. For example, a short query sequence of length 1000 is to blast against a reference genome of 3M. The homologous sequence of 1000 +/- 20 is extracted. Why is +/- 20 needed? Because we can't guarantee there must have a good match. Frequent blast users may be well aware that then there can be coverage, split-up due to local alignments, etc and that's why I would like to know if anybody has already developed a module to handle this kind of problem. Thanks in advance!
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l