[Bioperl-l] get_sequence() gets some sequences but not others

Wollenberg, Kurt (NIH/NIAID) wollenbergk at mail.nih.gov
Wed Jun 20 14:11:04 EDT 2007


I am working on a script to take a list of sequence IDs, extract the
sequences from GenPept, and then run a BLAST search for each of the
retrieved sequences. I am having a problem with the sequence retrieval,
where some sequences are found and others are not and it's not obvious to me
why this is. 

For example, using a text file containing the two following IDs as input:

My script 

while( <IN> ) {
  my $seqid = $_;
  my $seq_obj = get_sequence( 'genpept', $seqid );

will create a sequence object for the first ID, (print "Accession of
",$seqid," is ",$seq_obj->accession, "\n"; gives me the correct accession
number) but for the second I am told

-------------------- WARNING ---------------------
MSG: id (NEM1_YEAST) does not exist

When I pull up these records using the Entrez cross-databse search in my web
browser I find genpept records for both SKG3_YEAST and NEM1_YEAST (using
these search terms). In both records these IDs reside in the same field
("DBSOURCE    swissprot: locus") so I'm mystified why get_sequence finds one
but not the other. Any advice would be greatly appreciated.

Kurt Wollenberg, Ph.D.
Phylogenetics and Sequence Analysis Consultant
Biocomputing Research Consulting Section
Bioinformatics and Scientific IT Program (BSIP)
Contractor, Lockheed Martin

The information in this e-mail and any of its attachments is confidential
and may contain sensitive information. It should not be used by anyone who
is not the original intended recipient. If you have received this e-mail in
error please inform the sender and delete it from your mailbox or any other
storage devices. National Institute of Allergy and Infectious Diseases shall
not accept liability for any statements made that are sender's own and not
expressly made on behalf of the NIAID by one of its representatives.

More information about the Bioperl-l mailing list