[Bioperl-l] Indexing CDS file
heikki.lehvaslaiho at gmail.com
Wed Feb 11 07:44:08 EST 2009
Looks good. Are you going to do the changes in to the EMBL parser?
2009/2/11 Dave Messina <David.Messina at sbc.su.se>:
> Thanks, Heikki.
> I took a closer look at the EBI ftp site where Sviya and I got the file, and
> in their README (ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/README.txt) it
> PA line - contains the accession.version of the "parent" EMBL entry
> (entry where the CDS is annotated)
> So, unfortunately they've decided that a CDS record, which has no accession
> of its own, doesn't get its parent's accession number, but gets to refer to
> its parent's accession number via the PA line.
> Furthermore, there's an
> OX line - contains the NCBI taxid for the organism; taxonomic data are taken
> from the parent EMBL entries
> which is also not part of the the formal spec. (although this one is a more
> worthwhile addition, IMO)
> Sooooo, I think we'll need to add support for these.
> 'PA' seems easy enough -- the EMBL parser can look for it if there isn't an
> 'AC' line.
> As for 'OX', is there a standard slot for a taxonID in a RichSeq SeqFeature
> table? Coming from a Genbank record or a vanilla EMBL record, this is
> normally encoded as
> primary tag: source
> tag: db_xref
> value: taxon:9606
> Should do the same if we're coming from an EMBL entry, even though it's not
> actually in the feature table?
Heikki Lehvaslaiho - heikki lehvaslaiho gmail com
Sent from: Johannesburg Gauteng South Africa.
More information about the Bioperl-l