[Bioperl-l] Bio::Seq::GenEMBLI proposal

Ewan Birney birney@ebi.ac.uk
Sun, 17 Dec 2000 18:50:50 +0000 (GMT)

Ok. I have finally given myself some "fun coding" time (if GenBank/EMBL
format compatibility is considered fun) to factoring out the more
esoteric parts of GenBank/EMBL format off Bio::Seq into its own little

The proposal is an interface called


and an implementation Bio::Seq::GenEMBL. (interface allows other people to
comply with the interface without using the same implementation. A "good
thing" tm, in particular for database implementors).

At the moment I have just taken what is in the Bio::Seq object and moved
it into its own interface, written below.


  (a) should we keep with the each_ syntax, or would people prefer
"something returing an array of things" to have a different naming

  (b) should date's be formatted strings or something else? (if so, what?)

  (c) should keyword lines be split on keywords and each_keyword methods
or not?

  (d) should the interface extend to cover swissprot, in which case

      - name change?

      - additional methods?

Here is what I have so far for this interface definition, waiting to be
committed once I get the "ok" 

=head1 NAME

Bio::Seq::GenEMBLI - Interface to a Sequence object supporting
GenBank/EMBL format


    # Bio::Seq::GenEMBLI is-a Bio::SeqI, hence you usual
    # ->seq, ->subseq, ->id, ->top_SeqFeatures() is going to work

    # additional methods on Bio::Seq::GenEMBLI supporting
    # EMBL/GenBank format

    if( $seq->isa('Bio::Seq::GenEMBLI') ) {

	foreach $date ( $seq->each_date() ) {
	    print "date is $date\n"; # currently formatted string
	foreach $key ( $seq->each_keyword() ) {
	    print "key word is $key\n";

	foreach $sec ( $seq->each_secondary_accession() ) {
	    print "secondary accession number $sec\n";
	print "Entry is in ",$seq->division()," and has molecular
identifier ",



Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420