[Bioperl-l] Uniprot/Swiss accessions?

Smithies, Russell Russell.Smithies at agresearch.co.nz
Mon May 18 17:52:31 EDT 2009

Hi guys,
Thanx for your suggestions.

With the magic of awk and comm, I split the amalgamated accessions and created lists of swissprot IDs for both the file from NCBI and the file from Uniprot.

sp_ncbi_accessions.txt          458,377 ids
sp_uniprot_accessions.txt       466,739 ids

*       The NCBI file has 95 ids that don't appear in the Uniprot list
*       The Uniprot file has 8,457 ids that don't appear in the NCBI list
*       There are 458,282 ids that appear on both lists.

I did a quick random sample of the 8,457 ids unique to Uniprot and none could be found in the "protein" database at NCBI but all were in the "gene" database as "reference sequences that belong to a specific genome build" and all belonged to recently sequenced bacterial genomes. As none are in the "protein" database, they don't have GI numbers.

The 95 ids that were at NCBI but not in Uniprot were usually (random sample again) described as "putative protein" (or "very putative protein" in one case) and are the result of gene predictions. Eg http://www.ncbi.nlm.nih.gov/protein/48429254

So what I'll do is use the NCBI database and add in the extra 8,457 ids unique to Uniprot and assign them fake GI numbers so I can formatdb them with the " -o T" option.

Thanx again for your help,

Russell Smithies
Bioinformatics Applications Developer
T +64 3 489 9085
E  russell.smithies at agresearch.co.nz
Invermay  Research Centre
Puddle Alley,
New Zealand
T  +64 3 489 3809
F  +64 3 489 9174

Toitu te whenua, Toitu te tangata
Sustain the land, Sustain the people

Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.

More information about the Bioperl-l mailing list