[Bioperl-l] Error reporting/Validation implemented
skirov at utk.edu
Tue Mar 15 16:46:13 EST 2005
I used your parser to produce Bioperl objects based on some of the high
level features and compared it ot what I have. Your parser is
considerably faster (about twice), but it is still hard to tell as I am
descending further in the hierarchy with mine. At the same time I don't
think the difference will vanish, so I will start building over your
parser to produce bioperl objects. I am not sure exactly how I am going
to deal with the relationships that are necessary, but I'll deal with it
when I finsih everything else.
By the way it took 9 minutes on a 64 bit Xeon 3.4GHz even with Bioperl
objects construction on the whole Homo_sapiens ASN file. The data that
went inside the objects was: general desc of the genes (symbol, name,
summary, etc.), organsism descr. but none of the truly big parts.
Unfortunately, I am leaving tomorrow for a conference, so I will have
some more next week earliest. Thanks for sharing the code!
Mingyi Liu wrote:
> Hi, there,
> I just implemented basic error reporting and validation
> functionalities in my Entrez Gene parser in Perl (the regex version).
> The validation will catch all non-conforming data, while error
> reporting reports line number, error type, and the first 20
> (customizable) characters of the offending data (but the line number
> could be incorrect if the format resulted in an exception, which is
> hard to deal with for ASN.1-formatted data, although easy for XML
> The speed for the parser of course slowed down, but I'd say it'd still
> beat most parsers hands down. The full human genome now takes a bit
> over 12 minutes instead of 11 minutes to process on one Intel Xeon 2.4
> GHz CPU. So I don't think my parser's speed has much to do with
> performing validation or not.
> I had also communicated with Stefan Kirov and turns out the dead
> entries and 0-sized (should be 1-sized) arrays were simply related to
> data trimming options. So far, so good.
> If anyone is interested, check it out at
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
More information about the Bioperl-l