[Bioperl-l] Categorization of EST's by species/taxonomy/lineage

Mark Johnson mjohnson at watson.wustl.edu
Mon May 3 11:07:26 EDT 2004

Any port in a storm and all that, but surely there must be a simpler way.  8)

If nothing else, the blasts are not needed...I could write an entrez query
to grab nucleotide records for a given species, feed that to
Bio::DB::Query::GenBank, grab the first sequence that came back, and grab
the lineage from that.

I'm hoping there is a slightly more direct way...

> You say that because it wouldn't work, or because of all the blasts? My
> idea was that it would only be necessary to do one blast per species (or
> maybe a few, depending on the database's nomenclature being consistent
> or not). In any case, your solution is obviously better, I'm just curious.
> -Paulo
> Jason Stajich wrote:
>>ugh - I hope you are not really going to do this... - I'll post code
>> which
>>should work with Bio::DB::Taxonomy as this was what it was intended to
>> do.
>>On Thu, 29 Apr 2004, Paulo Almeida wrote:
>>>Perhaps a little far fetched, but what if you write a script that goes
>>> like:
>>>Read sequence from flat file.
>>>If it's from a new species,
>>>    Blast against mRNA database AND species database
>>>    Check hits until you get one with the full lineage
>>>End if
>>>Sort it based on the lineage
>>>I'm suggesting that only based on what you said, about mRNA records
>>>tending to have the full lineage, I have no idea if it would work. To
>>>blast against mRNA for the desired species you would add something like:
>>>$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$species [ORGN]
>>> AND biomol_mrna [PROP]';
>>>-Paulo Almeida
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l

More information about the Bioperl-l mailing list