[Bioperl-l] Problem retrieving CDS by Acession #

Sean Davis sdavis2 at mail.nih.gov
Thu Sep 7 15:11:51 EDT 2006


On Thursday 07 September 2006 13:16, Ryan Golhar wrote:
> > -----Original Message-----
> > From: Sean Davis [mailto:sdavis2 at mail.nih.gov]
> > Sent: Thursday, September 07, 2006 11:49 AM
> > To: golharam at umdnj.edu
> > Cc: bioperl-l at lists.open-bio.org; 'bioperl-l'
> > Subject: Re: [Bioperl-l] Problem retrieving CDS by Acession #
> >
> > On Thursday 07 September 2006 10:32, Ryan Golhar wrote:
> > > > On Thursday 07 September 2006 01:09, Ryan Golhar wrote:
> > > > > Hi,
> > > > >
> > > > > I'm using Bio::DB::GenBank::get_Seq_by_acc() passing in a valid
> > > > > accession #, XM_547879.2, for instance.
> > > > >
> > > > > I get the message in return:
> > > > >
> > > > > -------------------- WARNING ---------------------
> > > > > MSG: acc (gb|XM_547879.2) does not exist
> > > > > ---------------------------------------------------
> > > > >
> > > > > If I go to NCBI, and enter the accession, the GenBank entry
> > > >
> > > > comes up.
> > > >
> > > > > At first I suspected it was the version number, but
> >
> > removing the
> >
> > > > > version number still causes the same error.
> > > > >
> > > > > Am I doing something wrong?
> > > >
> > > > from the Docs for Bio::DB::Genbank:
> > > >
> > > >     $seq = $gb->get_Seq_by_acc('J00522'); # Accession Number
> > > >     $seq = $gb->get_Seq_by_version('J00522.1'); #
> >
> > Accession.version
> >
> > > >     $seq = $gb->get_Seq_by_gi('405830'); # GI Number
> > > >
> > > > So, you might try using get_Seq_by_version(....).  I
> >
> > didn't test it,
> >
> > > > but give that a shot.
> > >
> > > get_Seq_by_version() worked.
> > >
> > > That does not explain why get_Seq_by_acc does not work with the
> > > primary part of the accession #.
> >
> > As an example of why this shouldn't work, doing a search in
> > entrez (online
> > version) will bring up the newest version of an accession if
> > the version is
> > not included.  If one specifies the version, though, one gets
> > that version,
> > even if it is not the newest.  So, asking get_Seq_by_acc()
> > with a version and
> > ignoring the version would potentially get you the wrong
> > version for the
> > accession.
> >
> > If you know that you want the most recent version, just strip
> > the version
> > information and use get_Seq_by_acc().
> >
> > Sean
>
> Sorry, maybe I'm not being clear.  Suppose I only had the accession #,
> XM_547879.  If I call get_Seq_by_acc('XM_547879'), it gives the warning
> above.  That shouldn't be because I'm giving a valid accession number.
> I suspect something is wrong in the parsing of whatever NCBI is
> returning.

I'm not sure if it makes a difference, but an XM_..... is a RefSeq accession, 
not Genbank.  Does using Bio::DB::RefSeq do the trick?  Perhaps someone else 
can verify one way or the other that the refseq (ref) division is treated 
differently than the genbank (gb) division.  Note the error message that you 
got back that has (gb|XM_547879.2) a gb in it, not a 'ref'.  

Again, I didn't test this, so take it with a grain of salt.

Sean


More information about the Bioperl-l mailing list