[Bioperl-l] Best Practices for Downloading/Mirroring Genbank
Joseph.Karalius at seminis.com
Mon Jun 14 13:31:21 EDT 2004
I'm working on setting up a local mirror of Genbank here at work and am
unsure of what the best way to go about it is.
I started off real simple with a wget -m ftp://genbank.sdsc.edu/pub (Yes, I
wanted the BLAST formatted databses and executables as well) and the
transfer is going just fine, albeit excruciatingly slow at times.
But what happens:
1) between now and the next build?;
2) if I coose to mirror from an alternate source?;
3) after the next build?
For the first part, I just planned on doing daily wgets for the updates, and
the possibility occurred to me that if I miss the last couple days worth of
updates before the new build, those updates get shuffled into the main
build files and I have to download the whole thing again?
For the second, If I choose to mirror from Biomirror or NCBI instead of San
Diego, those timestamps seem to be different for what I am assuming to be
the same build. For example,
gbest1.seq.gz 19,454,020 bytes 5/22/04 5:04am SDSC Mirror
19,454,020 bytes 4/25/04 2:01am NCBI Mirror
19,454,020 bytes 4/25/04 2:01am BioMirror
For the third part, do the build files really change or are new entries and
revisions just added on as extra build files? I read that the files are
non-cumulative, so that would seem to confirm it, but the timestamps are
updated in sync with the latest build date.
How do I keep an updated mirror without losing daily builds or having to
download the whole thing every couple of months. How do I verify that I do
have the latest data, because checking timestamps does not seem like it will
work? Should I even bother with creating a true mirror?
I ran across this recent thesis on some of the issues in maintaining these
types of databases accurately while minimizing file transfers
I know that Biomirror has some scripts to facilitate efficient transfers but
do they handle updates. I'm guessing this problem has already been
addressed, I just can't find the solution.
Thanks in advance for any input,
Molecular Markers and Applied Genomics
Seminis Vegetable Seeds, Inc
37437 State Highway 16
Woodland, CA 95695-9353
More information about the Bioperl-l