[Bioperl-l] Indexing CDS file
cjfields at illinois.edu
Wed Feb 11 08:24:30 EST 2009
I'm guessing that line would be similar to DBSOURCE in GenPept files.
Could probably use Bio::Annotation::DBLink or Bio::Annotation::Target
for it (if it corresponds to a particular subset of the sequence).
On Feb 11, 2009, at 6:44 AM, Heikki Lehvaslaiho wrote:
> Looks good. Are you going to do the changes in to the EMBL parser?
> 2009/2/11 Dave Messina <David.Messina at sbc.su.se>:
>> Thanks, Heikki.
>> I took a closer look at the EBI ftp site where Sviya and I got the
>> file, and
>> in their README (ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/README.txt
>> ) it
>> PA line - contains the accession.version of the "parent" EMBL entry
>> (entry where the CDS is annotated)
>> So, unfortunately they've decided that a CDS record, which has no
>> of its own, doesn't get its parent's accession number, but gets to
>> refer to
>> its parent's accession number via the PA line.
>> Furthermore, there's an
>> OX line - contains the NCBI taxid for the organism; taxonomic data
>> are taken
>> from the parent EMBL entries
>> which is also not part of the the formal spec. (although this one
>> is a more
>> worthwhile addition, IMO)
>> Sooooo, I think we'll need to add support for these.
>> 'PA' seems easy enough -- the EMBL parser can look for it if there
>> isn't an
>> 'AC' line.
>> As for 'OX', is there a standard slot for a taxonID in a RichSeq
>> table? Coming from a Genbank record or a vanilla EMBL record, this is
>> normally encoded as
>> primary tag: source
>> tag: db_xref
>> value: taxon:9606
>> Should do the same if we're coming from an EMBL entry, even though
>> it's not
>> actually in the feature table?
> Heikki Lehvaslaiho - heikki lehvaslaiho gmail com
> Sent from: Johannesburg Gauteng South Africa.
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
More information about the Bioperl-l