[Bioperl-l] GenBank Feature: variation

Chris Fields cjfields at uiuc.edu
Wed Jun 7 14:08:19 EDT 2006


Nicolaus,

Bio::DB::GenBank use NCBI's efetch mainly; I implemented epost but it's a
hack at best and only works in certain circumstances.  So you could get the
sequence data directly but the links aren't included and are only given
through NCBI's elink.  There is no way I know of to get this information via
bioperl as there isn't an interface to NCBI's elink AFAIK (Brian?).  I'm
working on a rewrite for a general NCBI eutils interface for each tool
(efetch, epost, elink, etc), but it isn't working yet and probably won't be
ready to go until the end of summer-beginning of fall.

Just so you know how complex the situation is when using accessions, you
can't use a sequence accession directly when querying elink (and most
eutils), it has to be the GI number; I believe efetch is the only one that
accepts accessions.  So you would have to run esearch first using the
accessions as a query, grab the GI from the XML, run elink with the GI, grab
the SNP cluster ID, efetch the SNP data, and parse the data to get into
Bio::ClusterIO.  Fun, huh?  You would think NCBI would try making this a
little easier...

There used to be a way to parse dbSNP data using Bio::ClusterIO but the XML
schema changed so the parser is likely broken (the tests work but the file
is from the old schema).  I think Allen Day was in charge of it.

I used the eutils test interface () to grab the SNP cluster accessions for
your sequence using elink (note that the format is XML, which one  would
have to parse out to grab the cluster ID's):

<eLinkResult>
<LinkSet>
	<DbFrom>nucleotide</DbFrom>
	<IdList>
		<Id>33875090</Id>
	</IdList>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp</LinkName>
		<Link>
			<Id>4631</Id>
		</Link>
	</LinkSetDb>
	<LinkSetDb>
		<DbTo>snp</DbTo>

		<LinkName>nucleotide_snp_genegenotype</LinkName>
		<Link>
			<Id>28362589</Id>
		</Link>
		<Link>
			<Id>4635949</Id>
		</Link>

		<Link>
			<Id>28362591</Id>
		</Link>
		<Link>
			<Id>11545838</Id>
		</Link>
		<Link>
			<Id>4246814</Id>

		</Link>
		<Link>
			<Id>28670911</Id>
		</Link>
		<Link>
			<Id>4073746</Id>
		</Link>
		<Link>

			<Id>9313754</Id>
		</Link>
		<Link>
			<Id>11545840</Id>
		</Link>
		<Link>
			<Id>17077806</Id>

		</Link>
		<Link>
			<Id>28362590</Id>
		</Link>
		<Link>
			<Id>4076327</Id>
		</Link>
		<Link>

			<Id>9834</Id>
		</Link>
		<Link>
			<Id>4073745</Id>
		</Link>
		<Link>
			<Id>6879874</Id>

		</Link>
	</LinkSetDb>
</LinkSet>
</eLinkResult>


Chris

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nicolaus Hepler
> Sent: Wednesday, June 07, 2006 11:26 AM
> To: Brian Osborne; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] GenBank Feature: variation
> 
> Brian,
> 
> A sample accession is BC000007.  I figured a way around it though.
> Rather than automate the whole process, I just downloaded from Batch
> Entrez a flat .gb file of all my accessions.  It's not flexible, and
> will be inconvenient when we expand the dataset, but it will provide
> me with data to work with for now.
> 
> Nicolaus
> 
> > Nicolaus,
> >
> > The short answer is no, there's no option that will omit or add a
> > particular
> > feature or annotation to the Sequence object returned by
> > Bio::DB::GenBank.
> > Can you give some example accessions?
> >
> > Brian O.
> >
> >
> > On 6/7/06 9:46 AM, "Nicolaus Hepler" <nlhepler at umd.edu> wrote:
> >
> >> Hello,
> >>
> >> I am having some difficulty here.  I have a list of accessions, which
> >> are the parameters for a get_Stream_by_acc() function on a
> >> Bio::DB::GenBank object.  None of the returned GenBank information
> >> for any of my accessions seems to contain variation data, no matter
> >> how I try to coax it out with unflattener and typemapper.  This data
> >> is, however, available via the web interface of NCBI Nucleotide, as
> >> an optional feature (SNP).  I was wondering if there was some option
> >> I'm missing in the initialization of the Bio::DB::GenBank object (no
> >> options currently) that will coax the database into giving me this
> >> data?  Or something else that I'm missing altogether.  The organism
> >> of interest is human, taxon:9606.
> >>
> >> Nicolaus Lance Hepler
> >> nlhepler at mail dot umd dot edu
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



More information about the Bioperl-l mailing list