[Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs?

Chris Fields cjfields at illinois.edu
Mon Nov 9 23:58:32 EST 2009

On Nov 9, 2009, at 3:15 PM, Robert Bradbury wrote:

> On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song <gc11song at gmail.com>  
> wrote:
>> I'm new bioperl user.  I' working on a project: To determine the
>> status of all tutative SNPs such as non-synonymous vs. synonymous,  
>> and
>> predict the tranlational effect of non-synonymous mutations as benign
>> or malicious.  I'm trying to use bioperl to get the DNA sequence and
>> translate to protein sequence for the SNPs that are in gene's coding
>> region.  Could someone tell me how to do it?
> I too would like to know if this information is available.  I've  
> recently
> been working with the dbSNP results from NCBI but they display the  
> results
> in a graphical format rather than data that one can play with and ask
> questions of like "What is the most disease causing gene in the Human
> Genome?" or "What are the critical proteins damaged by gene defects  
> in the
> Human Genome?" ... "In terms of premature deaths, extended health care
> requirements, loss of quality of life, etc.?"
> The same types of questions can be applied to the dog and cat  
> genomes where
> there is emotional value or the cow, horse, pig, etc. genomes where  
> there is
> economic value?
> The value of BioPerl would increase significantly if there were
> functionality that would allow easy access to "these mutations may  
> have
> negative/positive impact" (which means you need a function that  
> qualifies
> mutations by degree) and allow for impact to be subjectively  
> determined
> (implying there must be some callback function to provide a user
> quality/impact rating).
> For example:
>   $/@differences =  protein_compare($mygene, $refseq_gene,  
> @critical_aa,
> @critical_domain, $callback)
> Where $callback could "rate" differences about the protein and  
> position and
> the "type of interest" (e.g. metal binding amino acids, structural  
> changing
> amino acids, critical catalysis amino acids, etc.).
> A default callback would be based on some evolving definition of  
> "critical"
> changes which result in human disease for example.
> This is a "required" capability to be able to determine things like  
> the
> "adaptability" of a species -- those with fewest critical mutation  
> points
> may have better adaptability to mutation increasing circumstances.
> Please pardon any errors in perl syntax/usage its been a while since  
> I've
> written perl and I'd really rather be coding in C.
> Robert

I will say that most of the information from the SNP database is  
available in various formats (see following link under 'Retrieval  


You can access this information, as well as the full XML, using  
something like the following script.



#!/usr/bin/perl -w

use 5.010;
use strict;
use warnings;
use Bio::DB::EUtilities;

my $term = shift;
my $eutil  = Bio::DB::EUtilities->new(-eutil    => 'esearch',
                                       -db       => 'snp',
                                       -term     => $term,
                                       -usehistory => 'y',
                                       -retmax   => 100);

my $hist = $eutil->next_History || die "No history returned";

# for SNP XML, change retmode to 'xml'
$eutil->set_parameters(-eutil   => 'efetch',
                        -history => $hist,
                        -retmode => 'text',
                        -rettype => 'flt');

# dumps to STDOUT
say $eutil->get_Response->content;

More information about the Bioperl-l mailing list