[Bioperl-l] Get variation included in genbank file
David.Messina at sbc.su.se
Wed Jun 9 13:20:12 EDT 2010
Hi again Jessica,
Forgive my slowness here, but is this what you want to do?
1) Start with an NM_ mRNA record
in your example, NM_001110556.1
2) Obtain the corresponding NG_ genomics locus record in Genbank format
which would correspond to the example file you attached. Accession number NG_011506
Is that right?
There are probably more clever ways to do this, but here's how I would approach it:
1) extract the GeneID dbxref from the NM_ mRNA record using Bio::SeqIO.
2) Use that to query the Gene database and get the related NG_ record
I don't know exactly what the field name is for the NG_ record, but you can list them all using this example:
and figure it out via trial and error.
3) Once you have the NG_ id, you can retrieve the genbank record
Here's the relevant example:
So, by now it should be obvious that I'm presenting a general strategy. You'll have to do some legwork to get exactly what you want.
Good luck, and if you come up with a nice solution, please add it to the wiki!
> I would need to automatically get a gbk file like this with :Variation(dbSNP) included and correct mRNA/CDS regions, can it be done automatically using EUtilities, I am not sure about it.
> On Mon, Jun 7, 2010 at 5:18 PM, Dave Messina <David.Messina at sbc.su.se> wrote:
> Hi Jessica,
> > Does any know how to include variation(dbSNP) in the genbank file format
> > automatically using NM_ accession number using bioperl?
> I'm not sure I understand the question.
> As far as I know, Genbank records don't include SNP information. See for example the record for human p53 (which has SNPs):
> I think though you should be able to get to a dbSNP record if you have a NM_ accession number using the BioPerl interface to NCBI's EUtilities.
> More information here:
> If that's not what you're after, could you clarify what you want to do?
> Jessica Jingping Sun
More information about the Bioperl-l