[Bioperl-l] [How to add features in genbank flat file]
jason.stajich at duke.edu
Thu Mar 24 20:51:28 EST 2005
You seem annoyed that no one solved the problem for you - I hope that
you realize that if you want a specific feature you can also modify the
module yourself and provide a patch to the project.
As for the specifics of your problem perhaps if you highlight what the
entrez key-value sets need to be set to in order to get the SNP data we
can add it to the GenBank::Query as an option.
Removing the blank lines is part of the SeqIO parsing but I suppose a
state variable could be added in genbank.pm to not skip them when in
the 'COMMENT' state if this is a critical feature for you.
If you are just downloading genbank files it looks like you have a good
solution so I'm glad you were able to figure it out.
> No one seems to have a solution to this problem I posted a month ago.
> So, I changed my mind and use 'wget' to get the GenBank sequences.
> I get the full GenBank entry, with most of features.
> And I can avoid another bug: COMMENT lines are not well formated with
> the BioPerl script I used (not as COMMENT lines are on NCBI), and
> blank lines are removed.
> #!/usr/bin/perl -w
> use strict;
> use diagnostics;
> use File::Cat;
> my $acc=$ARGV or die "\n\tThe accession number you seek for is
> missing.\n\tTry something like: $0 NM_178432\n\n";
> `wget -O output_file.tmp
> _HPRD=32" 2>/dev/null`;
> cat ("output_file.tmp", \*STDOUT);
> # wget -O output_file
> Sorry, I don't use BioPerl to Query GenBank (but for other
> applications) but BioPerl 1.5 has not corrected the COMMENT bug and
> the missing features.
>> I saw that Genbank web site have changed:
>> Now, features like 'SNPs' are no more included in the EST flat files.
>> At the NCBI web site, we must click on 'features: SNP' to add them in
>> our flat file.
>> With BioPerl, 1.4 or 1.5, it's the same, the variation features are
>> no more included in the EST flat files that I upload.
>> Here is the script I use:
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::GenBank;
>> use Bio::DB::Query::GenBank;
>> use Bio::SeqIO;
>> my $acc=$ARGV or die "\n\tThe accession number you seek for is
>> \n\tTry something like: $0 NM_178432\n\n";
>> my $query_string = "$acc";
>> my $query = Bio::DB::Query::GenBank->new(-db=>'nucleotide',
>> my $gb = new Bio::DB::GenBank;
>> my $stream = $gb->get_Stream_by_query($query);
>> my $out=Bio::SeqIO->new(-format=>'genbank');
>> my $seq = $stream->next_seq();
>> my $result=$out->write_seq($seq);
>> $result =~ s/^1.*$//;
>> #print $out->write_seq($seq);
>> print $result;
>> How can I add most of features to my nucleotide flat files ?
> Sébastien Moretti
> CNRS - IGS
> 31 chemin Joseph Aiguier
> 13402 Marseille cedex
More information about the Bioperl-l