[Bioperl-l] NCBI adoption of AGP v2.0 and new qualifiers in GenBank/EMBL
p.j.a.cock at googlemail.com
Fri Jan 20 05:46:18 EST 2012
I just spotted this via the @NCBI twitter feed,
In addition to the NCBI switch from AGP v1.1 to v2.0, the INSDC have
recently added a new feature type called "assembly_gap", and the
associated qualifiers "gap_type" and "linkage_evidence" to the INSDC
Feature Table Definitons.
Quoting from version 10.0, dated Dec 2011
> Feature Key assembly_gap
> Definition gap between two components of a CON record that is
> part of a genome assembly;
> Mandatory qualifiers /estimated_length=unknown or <integer>
> /linkage_evidence="TYPE" (Note: Mandatory only if the
> /gap_type is "within scaffold" or "repeat within
> scaffold".If there are multiple types of linkage_evidence
> they will appear as multiple /linkage_evidence="TYPE"
> qualifiers. For all other types of assembly_gap
> features, use of the /linkage_evidence qualifier is
> Comment the location span of the assembly_gap feature for an
> unknown gap is 100 bp, with the 100 bp indicated as
> 100 "n"'s in sequence.
i.e. DDBJ, ENA & GenBank flat-files will start to use the "assembly_gap"
features to display information derived from version 2.0 AGP files from
10th Feb 2012. Probably this will affect the XML variants as well.
Unless any of the parsers/writers for GenBank or EMBL flat files use a white
list approach, the new feature key and qualifiers shouldn't cause a problem.
More information about the Bioperl-l