[Bioperl-l] Genbank file : bad features (tag) order with /translation
cjfields at illinois.edu
Wed Aug 3 10:08:33 EDT 2011
On Aug 3, 2011, at 8:46 AM, Peter Cock wrote:
> 2011/8/3 Maxime Déraspe <maximilien1er at gmail.com>:
>> when I parse a genbank file no matter what I do, the /
>> translation="MKAV.." tag value of a CDS never appear in the last place
>> as it should be. Other tags like /note= /product comes after /
>> translation which it's not the usual practice with genbank file. Could
>> anyone have an idea how to deal with it... put /translation tag value
>> in the last place when I write the genbank file.
>> Thank you !
> Hi Max,
> I'm not aware of anything in the feature table specification
> about the order of the feature qualifiers (the "tags" like /note
> and /product). See http://www.ncbi.nlm.nih.gov/collab/FT/
> I suspect BioPerl is using a hash (Biopython uses a dictionary)
> for the feature qualifiers, which would discard the order.
> Why do you care about the order?
Yes, it uses a hash based on the feature tags. Not sure how Biopython handles it but my guess is something similar (Peter?).
The output order was never a chief concern of ours. To tell the truth our main focus has never been simple conversion, except to transform data into a format that is more manageable/normalized.
For those interested in making this change, all the code for printing features is in one method in Bio::SeqIO::genbank, _print_GenBank_FTHelper(). The best way to handle this would be to allow an optional coderef/callback that takes the feature (or the tags) and allows custom sorting and printing; I don't want to get into messy semantics on how to specifically sort tags, best to let the user decide.
More information about the Bioperl-l