Bioperl: Accessing sequences via Bio::DB::SeqI

Ewan Birney
Fri, 28 Apr 2000 10:14:24 +0100 (BST)

On Thu, 27 Apr 2000, Mark Dalphin wrote:

> Hi,
> I'm looking at implementing a class to access a GCG SeqStore database (Oracle
> backend) using BioPerl.  I'm trying to integrate this as cleanly as possible,
> perhaps more for my practice and a sense of elegance than for anything else.  In
> otherwords, I think I could hack this easily, but my questions go more towards:
> "What were the original implementors trying to achieve?".

Great stuff Mark. I am responsible (as always) for the semantic mire down

> I'm looking at the abstract classes: and
> and the non-abstract (real?) class

Yup. These are the right classes to look at.

> It looks like is merely a subset of, with the "iterate
> through the database" stripped out.  Is this for some reason other than "it got
> left in the developement tree when we switched its name from RandomAccessI to
> SeqI" or is it because it does represent a different abstract class with less
> functionality?  If so, why would this be done? Isn't that the purpose of the
> subroutine stubs in the abstract class to begin with?

If you look at interfacing to a number of databases across the web, it is 
impossible and inadvisable to implement a database iterator style
functionality. Hence RandomAccessI and SeqI <- SeqI is what you would want
to implement, but RandomAccessI is a subset for these web databases.

> Looking at, it comments that its class is and then inherits
>! I assume it was in the process of being switched from one to
> the other; I can't tell which way it was moving, however.  Anyone know?

The documentation is wrong. It should say in the docs that it inheriets
from When we made the /Index classes have iterators I needed to
split out SeqI from RandomAccessI. 

Documentation is at fault, and the @ISA is correct.

> Looking at, which I am taking as the "master" abstract class, I wonder
> about the iterator function:
>     @ids = $seqdb->get_all_ids();
>     $stream = $seqdb->get_PrimarySeq_stream();
>     while(my $seq= $stream->next_seq()) {
>         # $seq is a PrimarySeqI compliant object
>     }
> Given the increasing sizes of the databases (I know the one I am working with is
> huge!), I wonder if this iterator should permit some kind of a selection
> function. That is, for example for SeqStore, where the sequences are stored in
> an Oracle DB, why not include a set of criteria, or even a SELECT statement?

Grrrr. Then we get into Object Query Language and alot of mess. I would
prefer a system where the "selection" criteria is part of the concrete
database object, and hence can be specialised for individual databases and
not part of the interface. ie...

  $db = Bio::DB::SeqStore->new ( 'locator' => $oracle_locator_handle,
				 'subset'  => { length => 1500,
                                                type   => 'cdna' });

  @ids = $seqdb->get_ids();

In my view this is much nicer and means you can just think about SeqStore 
sort of queries constraints rather than use tackling the completely
complex "I want a way of query efficiently on objects through any
implementation of the database"

Does this make sense?

> Then I could say:
>     @ids = $seqdb->get_ids( { length => '> 1500',
>                               type  => 'cDNA',
>                               species => 'Homo sapiens'});
> where the parameters were defined by the specific instance (correct word?) of
> the class derived from the abstract class. A derived class which accessed an SQL
> database might permit direct SQL queries to return the subset of IDs.
> Also, should such an interator return a full Bio::Seq object rather than a
> Bio::PrimarySeq object (or should it be selectable?); I certainly hope that our
> database will contain a great deal of annotation in addition to merely sequence.

My 'view' on this is that for the fully annotated objects you get a list
of ids and then use the RandomAccess stuff to get the annotated objects.

However, I have a feeling that I am going to bullied into suggesting that
we make a dual iterator. If the iterators behaved in the SeqIO fashion of


would this be a good idea.

> Comments?

Great for you to take the time over this. Feel free to suggest other

> Mark
> --
> Mark Dalphin                          email:
> Mail Stop: 29-2-A                     phone: +1-805-447-4951 (work)
> One Amgen Center Drive                       +1-805-375-0680 (home)
> Thousand Oaks, CA 91320                 fax: +1-805-499-9955 (work)
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL:
> For info about how to (un)subscribe, where messages are archived, etc:
> ====================================================================

Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420

=========== Bioperl Project Mailing List Message Footer =======
Project URL:
For info about how to (un)subscribe, where messages are archived, etc: