[Bioperl-l] SearchIO speed up
cjfields at uiuc.edu
Thu Aug 10 14:54:18 EDT 2006
> > ...Except I need to know if the community considers the speed problem
> > solved or not. More radical changes will make SearchIO even faster, eg.
> > Chris Fields and Jason (if I interpret the Project priority list item
> > correctly) have suggested an end to individual Hit and HSP objects,
> > which become just data members of a Result-like object. Ideally I don't
> > want to go down that route because we lose quite a bit of OO power;
> As already mentioned, a lazy-evaluation approach would also work.
> Jason and I did once talk about an entirely new parsing/object-building
> framework, based on nested grammars; in essence, the "top-level" parser,
> simply "chunks" the input into blobs of (minimally parsed) text that
> correspond to the top level result object. This chunk/blob is the input
> to the next-level parser for Hits, which in return has chunk for HSPs.
> Note that the Result/Hit/HSP "chunks" are "fat", i.e. they *are* the same
> Generic*I-implementing objects we're already using. Thus, if HSPs are
> never interrogated, they're never parsed; as soon as one is interrogated,
> it gets parsed, and so on. In such an environment, you can imagine
> flyweight objects that are built very quickly/easily (recall that many
> previous analyses of BioPerl speed problems are not related to parsing, so
> much as heavy-weight object creation).
> I happen to have such a nested parser lying around for
> Bio::SearchIO::fasta.pm, but it also uses an Inline::C, yacc-generated C
> parser backend (yet another experiment in trying to get SearchIO to run
> faster), so really isn't ready for prime time (being entirely untested,
> and probably not even finished).
The 'nested parsers' idea sounds like a good approach as well though, like
you indicate, it would be outside of SearchIO. How well does it scale i.e.
very large reports?
More information about the Bioperl-l