[Bioperl-l] Is it possible to do contig alignments?

Chris Fields cjfields at uiuc.edu
Fri Aug 24 13:16:10 EDT 2007


On Aug 24, 2007, at 11:07 AM, Florent Angly wrote:
...

> De-Jian,ZHAO wrote:
>> How do you pad the sequences with gaps manually? Just replace the
>> hyphens with blanks? If yes, you can program in perl to automate
>> this process.
>>
> How do I pad the sequences manually?? I calculate how many gaps  
> have to
> go left and right of the aligned sequence based on its length, its
> position in the aligned consensus and the consensus length.
> my $newseq = '-' x $leftnum . $seq . '-'x$rightnum
> By the way, the sequences cannot be stored with blanks in them...
>
> I think the best way to provide an out-of-the-box solution for
> displaying contigs the described way would be to _not_ use  
> Bio::Align at
> all, but rather to create a new assembly IO module like
> Bio::Assembly::IO::simpleout for example. That would be useful.
>
> The reason I wanted to visualize these contigs is because I made a
> Bio::Assembly::IO module for TIGR Assembler files that I intend on
> submitting to BioPerl. I wanted to make sure first that I did not have
> any obvious bug in my contig coordinates. I've read the  
> documentation on
> the Wiki so if a BioPerl developer would please like lo step up and
> contact me directly for checking my code, that would be nice =)
>
> Florent

A similar question has been previously asked on the same subject:

http://thread.gmane.org/gmane.comp.lang.perl.bio.general/2827/focus=2869

Jason's suggestion was to have a Bio::Assembly::Contig method get_aln 
() which produces a Bio::SimpleAlign object containing appropriately  
padded seqs compatible for AlignIO output.  However, the method was  
never implemented.

Personally, the way I would try going about this would be to  
implement the Contig::get_aln() method, padding with bioperl- 
compliant alignment gap symbols (currently -.*?=~), so if anyone  
wanted they could write to any AlignIO-implemented format (MSF,  
Clustal, etc).  In your Bio::Assembly::IO::simpleout module implement  
write_assembly() and use the Contig::get_aln() method where needed to  
grab the SimpleAlign, then simply substitute gap symbols with spaces  
when writing contig output.

In general, any new code is attached to a bugzilla report as an  
enhancement request:

http://bugzilla.open-bio.org/

One of the devs will work on getting the code incorporated into  
bioperl.  Make sure the code is documented (http://www.bioperl.org/ 
wiki/Advanced_BioPerl), and attach appropriate tests (http:// 
www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests) and test data.

chris



More information about the Bioperl-l mailing list