[Bioperl-l] indexing conservation scores
deeepersound at googlemail.com
Wed Dec 22 19:00:25 EST 2010
bio::db:fasta is a beautiful tool for fast access to sequences present in
large flat text (fasta) files and I really love it. Now I'd like to speed up
the retrieval of data from large files that store conservation scores. The
files that I was able to find at UCSC have fixed step wiggle format, like
fixedStep chrom=chrYHet start=1 step=1
Does someone see a chance how to use the indexing mechanism used by
bio::db::fasta in order to allow retrieval of float numbers. I could
reformat the wiggle file to a simple space,tab or comma separated list of
scores per chromosome.
Are there suggestions? Or is there indeed a module that takes care about my
problem and I have just overlooked it?
Or won't such an approach get considerably faster than normal unix commands
sed -n '2,5001p' chrYHet.pp
to retrieve the scores?
More information about the Bioperl-l