[Bioperl-l] Uniprot/Swiss accessions?
Russell.Smithies at agresearch.co.nz
Mon May 18 17:52:31 EDT 2009
Thanx for your suggestions.
With the magic of awk and comm, I split the amalgamated accessions and created lists of swissprot IDs for both the file from NCBI and the file from Uniprot.
sp_ncbi_accessions.txt 458,377 ids
sp_uniprot_accessions.txt 466,739 ids
* The NCBI file has 95 ids that don't appear in the Uniprot list
* The Uniprot file has 8,457 ids that don't appear in the NCBI list
* There are 458,282 ids that appear on both lists.
I did a quick random sample of the 8,457 ids unique to Uniprot and none could be found in the "protein" database at NCBI but all were in the "gene" database as "reference sequences that belong to a specific genome build" and all belonged to recently sequenced bacterial genomes. As none are in the "protein" database, they don't have GI numbers.
The 95 ids that were at NCBI but not in Uniprot were usually (random sample again) described as "putative protein" (or "very putative protein" in one case) and are the result of gene predictions. Eg http://www.ncbi.nlm.nih.gov/protein/48429254
So what I'll do is use the NCBI database and add in the extra 8,457 ids unique to Uniprot and assign them fake GI numbers so I can formatdb them with the " -o T" option.
Thanx again for your help,
Bioinformatics Applications Developer
T +64 3 489 9085
E russell.smithies at agresearch.co.nz
Invermay Research Centre
T +64 3 489 3809
F +64 3 489 9174
Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
More information about the Bioperl-l