[Bioperl-l] Genbank file : bad features (tag) order with /translation
cjfields at illinois.edu
Wed Aug 3 13:10:31 EDT 2011
IMHO I find genbank too unwieldy, but it's nice to know the output works for NCBI submission.
On Aug 3, 2011, at 12:06 PM, Brian Osborne wrote:
> I currently use BioPerl and SeqIO::genbank to create the *gbf files for NCBI submission, they've always accepted them. In fact I think they don't even use them, I believe they use the *tbl, *fsa, and *agp files and the ASN file as data sources.
> Brian O
> On Aug 3, 2011, at 12:52 PM, Chris Fields wrote:
>> On Aug 3, 2011, at 11:00 AM, Peter Cock wrote:
>>> 2011/8/3 Maxime Déraspe <maximilien1er at gmail.com>:
>>>>> Why do you care about the order?
>>>> Hi Peter,
>>>> I care about the order for the submission to ncbi.
>>> Do the NCBI have some guidelines which ask for a particular order?
>> No, beyond the feature table there is no specification that indicates such that I am aware of. Submitted data is tabular; sequin is a nicer GUI API for getting data into a useful format for submission to NCBI, where data is converted to ASN.1 I believe.
>>>> But I guess they
>>>> will reformat the file before getting it in their database.
>>> They seem to generate the official GenBank files from their
>>> database - so I doubt the input order matters.
>> Yep, that's correct. If NCBI ruled the world everyone would be using ASN.1 (b/c that's what they use internally).
>>>> It's also
>>>> visually better when the translation of the protein comes in the end
>>>> of the annotation for the CDS and not before /product, /note ....
>>> I do see your point, but if that were the only motivation I wouldn't
>>> want to make generating GenBank output any more complicated
>>> than it already is.
>>>> Anyway maybe I'll reformat the file in sequin table for a direct
>>>> submission to ncbi with sequin.
>>>> Thank you.
>> Maxime, I find most users try to avoid using GenBank format except when absolutely needed. There is a very good reason Sequin and tbl2asn are used by NCBI for submissions; they end up generating simple tabular data that is easier to feed into their internal ASN.1 format. Genbank is a nice human-readable format, but structure-wise I find it's a pain to deal with, not to mention the variant third-party 'genbank' data that users want us to handle.
>> We try to support generation of output within reason, but that's never been our primary goal. As long as the output generated is capable of being re-read by our parsers with the data intact and generates sane data we're pretty happy.
>> Saying that, any additions to deal with this are perfectly welcome (I pointed out one mechanism that could be used), but they would have to address the concerns Peter and I alluded to previously, and it would be nice to evaluate how any changes affect performance. You are more than welcome to submit this as a feature request using our redmine server (including patches if you do this yourself):
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l