[Bioperl-l] Alignment excision script

Stephen Gordon Lenk slenk at emich.edu
Wed Aug 31 14:16:21 EDT 2005


Thank you! Following is thinking out loud.

I will accept your advice and reconvert to 100% script, easy. No new 
object type will be created. Actually, I already have a seperate 
script with AlignIO objects, just did not explain well.

- The script will create AlignIO objects with format defined by user 
(or have Bioperl guess format if user does not specify ...). All IO 
will be done using them. Flexible, 'any' format in, 'any' format out 
with CD excised via X. We will use this in our analysis pipeline.

- The alignment must be treated as a whole as the default 'X'ing out 
(partial excision) considers if a whole column is part of a CD. I 
first X out designated CD residues, then look to see if the whole 
column is X'd out before making the final excision on a copy of the 
original sequences. I can return eiher a full (all designated CD) or 
partial (only columns that all have X). I have this code solid, and 
plan to use it internally to script. Reuse what works well already.

- I have extracted needed information from the input AlignIO object 
already and process it using the above method. The internal excised 
alignment data is right. Just a matter of loading it into the output 
AlignIO object. 

- I can use AlignIO methods to add excised sequences etc to output 
object formatted as requested by user, sounds easy. Will look at 
Utilities for any shortcuts.

- I will further examine the Bio::Tools::Analysis for Bioperl methods 
to get the needed CD data, which is really just start/end pairs for a 
given protein sequence. Nothing fancy needed as far as representation 
for the already working code. All I use is "$start $end" to represent 
excision regions for given CD for given protein sequence. I make an 
array of these for a given protein and use that when I do the initial 
Xing out. I'd like to have internal reuse of existing reliable code.

- I have a t/ directory for the earlier script. I will expand and 
reuse this. POD documentation is in the code. I will modify it to 
reflect current status.

Again, thank you.

Steve Lenk
slenk at emich.edu

More information about the Bioperl-l mailing list