[Bioperl-l] Recommended way to download qual files from Genbank?

Cui, Wenwu (NIH/NLM/NCBI) [C] cuiw at ncbi.nlm.nih.gov
Tue Jan 15 11:50:53 EST 2008


There is an alternative way if you can download and compile NCBI C++ Toolkit (ftp://ftp.ncbi.nlm.nih.gov/toolbox/ncbi_tools++/2007/Aug_27_2007/) . Simply call the binary like:
 
id1_fetch -fmt quality -gi 13508865
 
Wenwu Cui

________________________________

From: Cook, Malcolm [mailto:MEC at stowers-institute.org]
Sent: Fri 1/11/2008 2:40 PM
To: Phillip San Miguel
Cc: Chris Fields; bioperl-l
Subject: Re: [Bioperl-l] Recommended way to download qual files from Genbank?



Phillip:

Of course - mea culpa - here's the full monty....

Indeed NCBI's eutils can do this:

> ncbi_eutil -search db=nucleotide term=AC207960 -fetch rettype=qual >
AC207960.qual

which uses my script (attached) to wrap NCBI's eutils.

It depends upon NCBI_PowerScripting.pm disributed in PowerFiles_0707.zip
by NCBI in their "Jul 24-27, 2007" course found at
http://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/scripts.html

I made a single edit to NCBI_PowerScripting.pm to 'select STDERR' at the
very beginning so that trace messages are not printed on STDOUT, such as
this echoed header:
         Retrieving 1 records from nucleotide...
... and footer:
        Received records 1 - 1.
        Wrote data to -.

(otherwise they are interspersed with downloaded qual files)

It also depends on recent version of GetOpt::Long.

Hope it helps.

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
 

> -----Original Message-----
> From: Phillip San Miguel [mailto:pmiguel at purdue.edu]
> Sent: Friday, January 11, 2008 1:33 PM
> To: Cook, Malcolm
> Cc: Chris Fields; bioperl-l
> Subject: Re: [Bioperl-l] Recommended way to download qual
> files from Genbank?
>
> Hi Malcolm,
>     Looks like your email was (inadvertantly?) redacted in
> some way. (No attachment and last sentence truncated.) Would
> it be possible to get a complete version so I can be sure I'm
> following you?
> Thanks,
> Phillip
>
> Cook, Malcolm wrote:
> > Indeed eutil is capable of this
> >
> > The following use of my ncbi_eutil (attached) script yeilds what you
> > want:
> >
> > ncbi_eutil -search db=nucleotide term=AC207960 -fetch
> rettype=qual >
> > AC207960.qual
> >
> > It depends on the version of NCBI_PowerScripting.pm , such as is
> > included in
> >
> > Malcolm Cook
> > Database Applications Manager - Bioinformatics Stowers
> Institute for
> > Medical Research - Kansas City, Missouri
> >  
> >
> >  
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org
> >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris
> >> Fields
> >> Sent: Friday, January 11, 2008 11:10 AM
> >> To: Phillip San Miguel
> >> Cc: bioperl-l
> >> Subject: Re: [Bioperl-l] Recommended way to download qual
> files from
> >> Genbank?
> >>
> >> I don't think this is possible with the current setup for
> >> Bio::DB::GenBank (which the script uses).  We'll have to
> investigate
> >> whether it is possible to retrieve this data via NCBI's
> eutils; if so
> >> we can try adding it in.  If you want you can submit this as an
> >> enhancement request via bugzilla for tracking:
> >>
> >> http://bugzilla.open-bio.org/
> >>
> >> chris
> >>
> >> On Jan 11, 2008, at 10:22 AM, Phillip San Miguel wrote:
> >>
> >>    
> >>> No problem getting sequence from genbank via a myriad of
> methods. 
> >>> But as the volume of non-finished sequence in genbank
> increases the
> >>> importance of also obtaining quality values for a given sequence
> >>> increases. Some records include quality values.
> >>>
> >>> I typically use bp_fetch.pl to grab a sequence from genbank:
> >>>
> >>> bp_fetch.pl -fmt fasta net::genbank:AC207960
> >>>
> >>> sends the fasta sequence to STDOUT. But that bp_fetch.pl wasn't
> >>> designed to pull down quals evidently:
> >>>
> >>> bp_fetch.pl -fmt qual net::genbank:AC207960
> >>>
> >>> gives:
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: You must pass a Bio::Seq::Quality or a Bio::Seq::PrimaryQual
> >>> object to write_seq() as a parameter named "source"
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw /usr/local/perl_5.8/lib/site_perl/
> >>> 5.8.8/Bio/Root/Root.pm:359
> >>> STACK: Bio::SeqIO::qual::write_seq
> >>>      
> >> /usr/local/perl_5.8/lib/site_perl/
> >>    
> >>> 5.8.8/Bio/SeqIO/qual.pm:205
> >>> STACK: /usr/local/perl/bin/bp_fetch.pl:313
> >>> -----------------------------------------------------------
> >>>
> >>> (running under bioperl 1.5.2)
> >>>
> >>> The quality values for this accession are in genbank as these URLs
> >>> demonstrate:
> >>>
> >>>
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=15493746
> >> 0
> >>    
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>    
> >>> 4937460&dopt=fasta
> >>>
> >>>
> >>>      
> >>
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&list_uids=1
> >> 5
> >>    
> >>> 4937460&dopt=qual
> >>>
> >>> What is the best way to pull down these qual values? They aren't
> >>> present in "GenBank(Full)" format. They are present in an ASN.1
> >>> format.
> >>>
> >>> Advice would be appreciated.
> >>>
> >>> --
> >>> Phillip
> >>> Purdue Genomics Core Facility
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>      
> >> Christopher Fields
> >> Postdoctoral Researcher
> >> Lab of Dr. Robert Switzer
> >> Dept of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>    
> >
> >  
>
>
>





More information about the Bioperl-l mailing list