[Bioperl-l] FW: BioPerl SeqIO-like system in BioPython
cjfields at uiuc.edu
Tue Sep 19 13:34:00 EDT 2006
The following is a request from Peter, one of the Biopython developers, for
suggestions from us Bioperlers (Bioperlites?). They are trying to implement
a SeqIO-like system for BioPython. Any suggestions/hints/help would be
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign
Chris Fields wrote:
> I know that BioPython is trying to get a SeqIO-like system set up.
> Let us know if you need any help/advice.
I've thought of a couple of things - if you want to pass this on to the
appropriate BioPerl people, please do so.
Internal names for formats
I want to use simple strings to describe the different file formats for use
as function arguments (e.g. "fasta", "genbank"), and ideally use the same
names as BioPerl:
Is the webpage authoritative? I would guess they match the module names
under Bio/SeqIO/*.pm and Bio/AlignIO/*.pm
The comments from Bio/AlignIO.pm list a few more names (not listed under
For the moment, my intention is to also include multiple alignments as part
of our sequence reading support.
How do you cope with assorted gap characters (typically dot/period and dash,
'.' and '-') and how different file formats treat them?
For example, multiple alignments in Fasta format probably use either,
depending on the source of the file.
Clustal and Phylip seem to use '-' as a gap. MSF uses '.'
Phylip apparently treats '.' as meaning "same character as the previous
sequence" which is asking for trouble.
Does BioPerl make any efforts to convert everything into an internal
standard (say '-') when loading files, and convert as appropriate when
This old thread suggests it is (was) left in the end user's hands:
More information about the Bioperl-l