[Bioperl-l] Help on a basic EST-genomic alignment script

Edward Chuong echuong at gmail.com
Sun Jun 27 04:27:02 EDT 2004


Thanks for responding! As I said originally I've just started playing
with bioperl so I'm a little lost on some of the terms you used.
> >
> > 1) Read in pero EST from a FASTA
> > 2) Standaloneblast it to local mus cDNA database, retrieve accession
> > from best result
> Just do
>  blastall -i ests.fa -d mus -p blastn -e evalue ...

Each individual EST is in its own file (filename is its ID like
PM_BWP0009A06.FAF with no particular pattern, so I just read in an
entire directory). Unless I'm missing something?.. should I read all
the est's into one file, then do blastall?

> Although I think you might do better to run a translated search against
> the mouse protein set.  You also will find you can get better results with
> FASTX/FASTY as it allows frameshifts whereas blast will only search one
> frame at a time.

Can you elaborate on what FASTX/FASTY are? I'm only using blast to get
the accession ID of the best match in my mus cDNA list, then getting
the full mus seq info from genbank with that ID (I think  I'm doing
someting too complicated..)
> > 3) Retrieve complete mus sequence with features from genbank using ID from (2)
> > 4) Make a clustalw simple align object using the mus protein sequence
> > from (3) against the translated pero EST for all 3 frames, and keep
> > the one with the best identity %.
> > --I'm done up to here--
> Why not determine the best frame when doing the search by comparing to the
> mouse proteins?

Not sure what you mean by this. Do you mean to manually look and check
if the proteins match up well? I'd like to avoid this if possible, I
plan to use this on several hundred EST files.

> > 5) Convert the aln frrom AA to DNA (there is a builtin aa_to_dna_aln
> > but it isn't working for me)

It seems like I have a lot of rewriting to do before this step so I'll
save this for later :)

> You basically need to predict the protein sequence from the EST - you can
> use estwise to do this based on the best mouse protein homolog.

Can you elaborate on what estwise is? Is this part of wise or a
separate thing? How do I run this in bioperl?

> -jason

Again, thanks very much for your help!

Take care

Edward Chuong

More information about the Bioperl-l mailing list