[Bioperl-l] Long /labels are wrapped, but can't be read
cjfields at illinois.edu
Wed Oct 7 10:09:11 EDT 2009
On Sep 30, 2009, at 4:50 AM, Adam Sjøgren wrote:
> On Tue, 29 Sep 2009 22:54:04 -0500, Chris wrote:
>> Not sure, but this could be a case of 'both'. Labels that are quoted
>> and aren't are currently distinguished via a global hash lookup
>> (%FTQUAL_NO_QUOTE) due to the way the parser works; there is some
>> logic behind this, just can't quite recall at the moment why it is
>> this way.
> Yes, I saw that there is a number of qualifiers that aren't quoted
> The very easy "fix" for me would be to simply remove "label" from
> %FTQUAL_NO_QUOTE, but I'm not really sure what the reason for not
> quoting all values is, so I was hesitant to just propose that.
It's basically for more control over format IIRC. It appears to only
play a role in output (via write_seq).
>> You could set a hash key for the label in cases where it isn't
>> that should work. You can also test out the Bio::SeqIO::embldriver
>> version (-format => 'embldriver').
> Ah, embldriver reads the wrapped qualifier when it isn't quoted
> problem. Nice! I hadn't noticed embldriver.
> I wonder which one is correct in this case?
> And should I switch to using embldriver to read, or does it make sense
> to try and concoct a patch that changes embl?
Bio::SeqIO::embldriver is an attempt to coalesce the parsers into a
generic driver/parser-handler framework; the various parsers (the
drivers) would parse data into simple chunks, basically hash refs of
data. These would be passed on to the handler object, which has
methods designed to handle the chunks passed in. Basically it's like
a souped-up XML parser, but the data is grouped together in a related,
meaningful way (like an entire seqfeature, for instance).
The main job of the driver is simply to parse the incoming data stream
into chunks of naturally related data (think XML, but larger chunks of
data, like an entire seqfeature) and pass it on to the handler object.
For the moment they're still experimental, but I put them out with the
release so they can be tested. The current problem with them at the
moment is there is no specification on how a data chunk is defined and
labeled, but I am thinking of using something like JSON for that.
> Thanks for the feedback!
> Adam Sjøgren
> adsj at novozymes.com
More information about the Bioperl-l