[Bioperl-guts-l] [Bug 3061] New: AlignIO hash sequence storage

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Apr 22 07:55:11 EDT 2010


           Summary: AlignIO hash sequence storage
           Product: BioPerl
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Core Components
        AssignedTo: bioperl-guts-l at bioperl.org
        ReportedBy: bernd at bio.vu.nl


Something I stumble one from time to time: the storage of sequence in AlignIO
is based in SeqIDs. This complicated reading alignments with duplicate IDs,
which actually do occur quite a lot (e.g. CDD of NCBI). Usually I try to
"uniqfy" IDs but this is not straightforward for all alignments formats.
Actually this is were BioPerl is really useful ;-)
I'd propose to store the Sequences in a hash in AlignIO using unique keys,
possibly optionally, to be able to read all sequences in the alignment, even
when they all have the same ID.

This would solve the replacing warnings too.
-------------------- WARNING ---------------------
MSG: Replacing one sequence [10/1-214]

Possibly this can be taken in with the AlignIO refactoring


Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Bioperl-guts-l mailing list