[Bioperl-l] Re: Bio::EnsemblLite::UpdateableDB

Ewan Birney birney@ebi.ac.uk
Sun, 16 Jul 2000 11:40:36 +0000 (GMT)

On Fri, 14 Jul 2000, Jason Stajich wrote:

> Preliminary version of the module Bio::EnsemblLite::UpdateableDB is
> checked in to bioperl (bioperl-db repository).
> This just does basic stuff by talking to a mysql db server, add a Bio::Seq 
> to the db, remove a seq, and fetch a seq from the db and make it into a
> Bio::Seq object.  This had to be separate code with separate tables 
> (from ensembl) because I am not expecting sequences to be part of contigs
> at this time.

Hmmmm. I guess this was always going to go this way, but I feel that
EnsemblLite and Ensembl are forking too early in the process. I guess we
could claim that

	Ensembl - database for fragmentory genomes

	EnsemblLite - database for sequences (stand-alone)

With code reuse occuring principly because they are both based on bioperl
and design reuse because they have to handle roughly speaking the same
things. Are we missing some sort of alignment between Ensembl and

BTW - Jason - have you handled the "how to store a SeqFeature::Generic"
type problem in the SQL?

> So there are a set of add-on tables to provide a seq description 
> and an association of generic features to seqs.
> sql/ensembl-lite-mysql-addon.sql - add ons
> sql/ensembl-lite.sql - ensembl-lite code (only the dna table is really
>                        used from this at present)
> I have been trying to put the EnsemblLite spec that I will
> propose on ensembl.org's wikki, but have been getting web errors.  (It
> just KNOWS everyone is getting ready for summer holidays).  Will try again
> over the weekend.  
> short TODO list (lest you think this is really finished code being
>                  submitted)
>  - implement _update function for updating seqs

What does this function do? 

>  - implement get_PrimarySeq_stream
>  - table schema discussion with interested parties, it is not really
>    CORRECT to refer to a table of sequences as 'dna' if some are protein
>    seqs...  Include a graphical table schema on doc.
>  - Start looking at the analysis pipline runnable/runnabledb system from
>    Ensembl and how we can hook into it.

I suspect this means you will be able to reuse our "Runnables" you will
need different "RunnableDBs"

Runnable reuse will be a big win. We might want to specialise some
runnables in something like FeatureProducingRunnable...

> -Jason
> Jason Stajich
> jason@chg.mc.duke.edu
> http://galton.mc.duke.edu/~jason/
> (919)684-1806 (office) 
> (919)684-2275 (fax) 
> Center for Human Genetics - Duke University Medical Center
> http://wwwchg.mc.duke.edu/