[Bioperl-l] Help needed urgently

Phillip San Miguel pmiguel at purdue.edu
Wed Nov 15 12:00:50 EST 2006

Sayali wrote:
> I wish to fetch consensus sequence and the names of the trace (chromat)
> files used in the assembly from the .ace file
> For this purpose, I am using Bio::Assembly::IO. But I am unable to find the
> appropriate methods which would enable me to fetch this information.
> Note: 
> The BS line (base segment) in the .ace file indicates which read phrap has
> chosen to be the consensus at a particular position.
> For example: 
> BS 1 515 K26-572c gives 
> BS <padded start consensus position> <padded end consensus position> <read
> name> respectively.
> How do I retrieve this information contig wise?
> Kindly help.
> Regards,
> Sayali D Salodkar
Hi Sayali,
    I don't think there is a pre-rolled bioperl method to do what you 
ask. (But it seems like one is under construction?)

    You can extract the contig sequences from a phrap .ace file with the 
following plain perl code:

my $print_it    =   0;
while (<>){
    $print_it =     /^CO / ?   1
                 :  /^$/   ?   0
                 :             $print_it
    s/^CO />/;
    s/\*//g;     #removes "pads"--optional
    print if $print_it;

    To find the reads that went into each contig, you do *not* want the 
BS tagged records. My understanding is that BS is just what consed uses 
to populate its consensus line from the ace file.
    The simplest way is:

egrep '^CO|AF' acefilename

if you are on a unix system. Or with perl

while (<>) {
    print if (/^CO|AF/);

Purdue Genomics Core Facility

More information about the Bioperl-l mailing list