[Bioperl-l] BLAST and parsing question

Torsten Seemann torsten.seemann at infotech.monash.edu.au
Tue Apr 25 01:55:52 EDT 2006


>> So you want all length 20 subsequences (derived using a sliding window
>> from some set of sequences) which are do not appear in some other set of
>> sequences (virus-db) ?

> Yes, that's basically it.  Find out which 20 unit long subsequences of 
> my sequence are not found in my database.

Well, using BLAST is probably not the most appropriate tool for this 
problem as it will find 'high scoring' matches, not exact matches.

Perhaps simply using Perl's "index()" function, which tests if one 
string is in another string, would be simpler?

You could even concatenate all your database sequences into one big 
sequence, inserting 20 "N" (if DNA) or "X" (if nucletotide) between each 
(or any other char you don't have in your sequences). Then you could 
simply loop through your 20-length subsequences using the sliding window 
as before, and do a "index()" for each against the one big database 
string. If index() returns a negative value, it wasn't found.

Hope this helps,

Torsten Seemann.

More information about the Bioperl-l mailing list