[Bioperl-l] best method for dealing with hg18 seq-features in a relational database?

Chris Fields cjfields at illinois.edu
Tue Jul 20 14:55:15 EDT 2010

On Jul 20, 2010, at 12:53 PM, Jonathan Epstein wrote:

> Hi,
> I need a reasonably rapid runtime method to extract the gene features associated with a particular genomic region in the hg18 human assembly.  This seems to require a relational database.  Of course I would like to do this using Bioperl.
> So then the question becomes: what's the best approach to use?  Biosql/BioPerl-db looks like the cleanest solution, but it's not clear to me how up-to-date it is; it sort of looks like an abandoned project.

Not really abandoned, but it won't help in this case unless you intend on converting to BioSQL.  

> CHADO/Gmod is more actively maintained, but I don't know how/whether I can obtain BioPerl bindings.

Look at Bio::Schema:Chado.  It's more middleware, no direct BioPerl bindings yet.

> I am also open to using the UCSC databases as a starting point, since I've already mirrored most of the human-related portions of the UCSC environment.
> Then there's the small matter of loading the hg18 annotations into the appropriate relational database.
> Thanks in advance for your guidance,
> Jonathan

Bio::DB::SeqFeature::Store.  Export the data into GTF; I think the database will load this (via Bio::DB::SeqFeature::Store::GFF2Loader).  Otherwise, using the ensembl perl API is an option.  One can install the database locally and use the Ensembl Perl API to extract the information you need.

There were modules that were intended as interfaces to the remote UCSC databases but (IIRC) there was significant concern from the UCSC folks about overloading the UCSC server with queries, so they were basically deprecated (were only partially developed anyway).  


More information about the Bioperl-l mailing list