[Bioperl-l] Downloading multiple contigs using bioperl
cjfields at uiuc.edu
Mon Sep 18 12:13:37 EDT 2006
> I think this might be a simple question - but I'm yet a novice...
> Is there any way I can download, automatically and at once, all contigs of
> given genome in Genebank, and ideally merge them all into one file? Or do
> have to download every contig separately in order to receive the full
> In the latter case, is there some sort of list that provides the
> of all contigs of the genome I'm interested in?
> Thank you very much,
It depends on the type of sequence record. WGS files contain WGS line
annotation which gives a range of sequence records that can be retrieved:
LOCUS AAFC03000000 131728 rc DNA linear MAM
DEFINITION Bos taurus whole genome shotgun sequencing project.
VERSION AAFC00000000.3 GI:112180191
/isolate="L1 Dominette 01449"
The WGS line is the range of single sequences and the scaffolds represent
different scaffold or supercontig builds. The contig files contain the list
of subsequences for the build (which can be pretty complex), but these
aren't necessary if you want the sequence itself. That can be retrieved
directly from GenBank using Bio::DB::GenBank with the default settings; if
you use the web Entrez interface you can get the full sequences by selecting
the format 'GenBank(full)'.
Depending on what you are after, you may be better off downloading the
sequences via ftp, though. Some of these files are very large (~100 MB or
more). Retrieval via Bio::DB::GenBank converts everything into BioPerl
objects before saving, so these files may take a long time if they work at
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l