[Bioperl-l] UCSC database backend

Chris Fields cjfields at uiuc.edu
Wed Aug 9 15:21:57 EDT 2006


> Before we get too far down this line of thought, keep in mind that this
> will
> be dozens of Gb of sequence and database tables.  See here for details:
> http://genome.ucsc.edu/admin/mirror.html
> The sequences include all of genbank, essentially.  The mysql tables ALONE
> (no sequence) for only ONE human assembly is on the order of 10Gb--not the
> kind of thing you can download in a few minutes (or even hours).  Just to
> keep in mind....

Yes, there was a recent bug related to the packing order for very large
files (>4 GB, I believe).  I'm hoping Lincoln takes a look at it soon for
further suggestions as the proposed changes would require reindexing
everything.  However, the proposed fix did work well for the submitter.

> On another point, the strength of UCSC is not in obtaining sequence, but
> in
> mapping to the genome.  I think getting actual sequence should be
> secondary
> here, if for no other reason than there are trivially easy ways of getting
> sequence information from elsewhere given an accession or ID.  There is
> simply too much information to be stored locally for most people and
> getting
> the data remotely from UCSC doesn't seem possible currently.
> Sean

Then we could use this to primarily return location and other information
instead.  Anyone interested in sequence can use the location info to
retrieve sequences remotely (via Bio::DB::GenBank or similar) or locally

The key is to get this set up in some basic way that people could start
using it, make suggestions, etc.  Sendu, any suggestions?


More information about the Bioperl-l mailing list