[Bioperl-l] "Be forgiving in what you accept" and Bio::Tools::GuessSeqFormat

Chervitz, Steve Steve_Chervitz at affymetrix.com
Fri Jul 22 14:22:26 EDT 2005

George Hartzell <hartzell at kestrel.alerce.com>:

> Is there any reason not to extend the regexp a bit and relax that
> constraint (since everything else seems to cope with it)?

Seems reasonable. I've seen fasta files where there was no id at all, just a
'>' by itself on a line followed by a line of sequence. Perhaps the sequence
format guesser should accept as fasta any input with a line beginning with
'>'? But maybe this is too radical...

> But, if you happen to have the sequence in a file with a funny name
> (e.g. /var/tmp/apreq23ZHis [aka a form upload]) then it fails.  It
> can't guess based on the filename and the file content test is strict
> and wants to see the header line without the whitespace (">ape").

Would be good to add this example to the SeqIO test suite.

> There's a great "old" Internet maxim, "Be forgiving in what you accept
> and strict in what you send".

Here's an interesting discussion on this philosophy:

The FCC had this notion long before the internet. It's part of the specs on
many off-the-shelf electronic devices: CFR part 15 "Devices must not
interfere with licensed services and must accept interference from licensed

I found a recent presentation on the FCC site showing results of a survey
about whether part 15 stifles innovation (10/14 respondants said no, and 9/5
said more stringent regulations might even permit *more* innovation):


Flexibility in input acceptance may be an issue in Bioperl to the extent
that it leads to complicated code that is difficult to maintain or for
others to grok. But in this particular SeqIO case, flexibility seems
warranted. I think it should be up to a specific application to wield
authority over what it accepts and produces for fasta files. Since bioperl
is a library used by multiple apps, high flexibility in acceptance seems
like a bonus.


More information about the Bioperl-l mailing list