[Bioperl-l] SimpleAlign Ids

Bernd Web bernd.web at gmail.com
Wed Apr 21 13:08:29 EDT 2010


Would it be an idea to change SimpleAlign/AlignIO as not to use
sequence IDs in the hash to store the sequences?
Quite regularly I run into issues with alignments that do not have
unique IDs. This esp. occurs with alignments from the CDD at NCBI.
When I know the input format I (or a user) is using, I have a
pre-processing step to make all IDs unique.
However, when the input format can change everytime, it is really
handy to use SimpleAlign with the format guesser.
When the sequence objects would be stored using unique keys instead of
Ids this issue would not occur.
I can image that for other using large alignments this might not be
handy as I suppose an extra lookup step would be needed.

Is the above an issue that  others run into too?

Kind regards,

More information about the Bioperl-l mailing list