[Bioperl-l] what's the optimal way to search a fasta file for matching ID's?
joseph.fass at gmail.com
Thu Oct 25 17:50:02 EDT 2007
I would appreciate any advice, big or small, on this ...
I've got a decent-sized database ... 90,000 sequences or so in a single
fasta-format file. Then, I've got sequence ID's from that database that
show up in blast reports. I want to collect those ID's and their sequences
(for the purposes of exploring possible contigs). Since the blast report
only includes sub-sequences (from alignments) of my sequences, I want to
parse the report, then match each hit ID against an ID in the database, so I
can pull out its full sequence. Is there a faster way to do this than
opening the database file each time I have a new hit ID, so I can search it
from beginning to end? If I push each sequence onto a list or hash, it's
liable to chew up a lot of RAM, I'm guessing. Any suggestions?
Thanks in advance,
joseph.fass at gmail.com || joefass at hotmail.com
970.227.5928 (c) || 530.754.7978 (w)
More information about the Bioperl-l