[Bioperl-l] Aligning BLAST results
ajm226 at cam.ac.uk
Mon Oct 18 07:37:04 EDT 2004
Quick note... this is a long message. If you don't want to read it all,
then here's a summary: what is the best program/module to use for aligning
the DNA sequences of different lengths, coming from the same cDNA, in order
to get the longest possible contig?
Longer version: I am currently using BioPerl to automate a laborious task.
I have a set of 125 contigs (originally assembled from available
Schistosome EST data some years ago). These contigs are those that were
unidentified after BlastX homology searching. In order to improve our
chances of getting relevant homology results, we want to extend the known
sequence of these contigs using newer EST data.
I have managed to get a basic Perl program working, which takes each
contig, locally blasts it against an EST file I have, then takes the best
results and assembles them together with the contig, in the hope of getting
a longer sequence. This longer sequence is then remotely blasted at the
NCBI on BlastX, and the results stored.
My question is, what is the best program to use for aligning the BLAST
When I did this manually (I gave up very quickly in favour of automation!),
I used EditSeq from DNAStar to align the sequences. Not knowing much about
alignments, in my program I opted to used ClustalW. This has given me
alignments, but the percentage idenity values are usually very poor (circa
50% - which I would imagine is little better than what you could get by
chance by arranging random sequences). I then discovered that ClustalW is a
global alignment program, and I really need a local alignment program. I
then tried using the dpAlign module, and this gave worse results. I have
since found something which suggests that dpAlign is only for protein, and
I should be using pSW. Anyhow, dpAlign definitely appeared to be doing a
global alignment, since the ends were flush.
I then tried using TCoffee, but have not, as of yet been able to get good
I have considered creating the alignments from the BLAST data (since this
gives the start/end base numbers of the alignments, and the sense), but I
wouldn't be able to account for gaps etc.
Apologies for the long message, but can anyone suggest which program/module
I should be using to align these BLAST results.
Thanks very much,
More information about the Bioperl-l