[Bioperl-l] Getting 'features' from SearchIO?
cjfields at illinois.edu
Mon May 11 11:39:54 EDT 2009
On May 11, 2009, at 9:34 AM, Dan Bolser wrote:
> I am parsing a blasttable and extracting Bio::Search::HSP::GenericHSP
> objects as a result. I read somewhere that HSP objects inherit Feature
> objects... How can I get a 'standard' representation of the HSP as a
> feature? Basically I'd like to simply load the blast results into a
> feature database...
They are Bio::SeqFeature::SimilarityPair (all Bio::Search::HSP::HSPI
> When I call feature methods on the HSP objects I just get blank or
> undef results... I think this is because I'm trying to get at the
> sequences existing (non existent) features, rather than get the HSP
> object as a feature... If that makes sense... How can I confirm that I
> have a feature object containing the details of the HSP?
These are decorated feature pairs (they map to one another), so you
would need to do something like $hsp->hit to get at the actual
SeqFeature data for the hit, and similarly $hsp->query for the query
SF. They technically have the SeqFeatureI methods but I believe they
delegate to one specific feature (the query) unless you explicitly
specify which feature to grab info from ('query', 'hit/subject').
I have added some tests for t/SearchIO//blasttable for this.
> I thought of trying to just pass the HSP object to the
> Bio::DB::SeqFeature::Store, but I need to get that up and running
> first (I'm looking into it). In the mean time I thought I'd ask if
> this sounds like the right thing to do.
Worth a try to see what happens, but I'm not sure it would work as you
expect, seeing as the methods by default delegate to the query (and I
don't know if support for feature pairs is built in to
Bio::DB::SeqFeature::Store). Also, last I recall, SF::Store stores
everything based on a specified SF class, not the interface, so mixing
SFs classes in the same database (such as Bio::SB::SeqFeature,
Bio::SeqFeature::Generic, and HSPs) may not be the wisest thing. I
haven't used it in a little while, though, so that may have changed.
Just to note, this problem has been 'solved' to some degree in the
past. I think there are a few blast2gff scripts floating around, and
there is a Bio::SearchIO::Writer::GbrowseGFF module, though it isn't
maintained. The main problem is the mapping is subjective based on
what your reference sequence is within the BLAST run (e.g. whether it
is the query or the hit), and is something that can't be automatically
discerned. I ended up rolling my own with SeqFeature::Store (just
mapped the relevant data to Bio::DB::SeqFeatures), but I have long
wanted to fix up the relevant scripts to integrate my changes in, just
haven't had the time (though that may change soon :)
> More generally I want to have features attached to sequences that are
> themselves annotations of larger sequences (but with unknown
Did you mean 'features of larger sequences'?
At the very least, you can define a region a feature falls within; if
it falls within a region that has gaps on both sides:
you can still assign coordinates to the feature for that release based
on the estimated length of the gaps. Therefore it may change in a
future release if the gaps are filled in.
Otherwise I would assume it's simpler to designate it as a feature in
a singleton sequence (on it's own) that hasn't been mapped.
> Is Bio::DB::SeqFeature::Store a way to go? I need to manage
> various different bits of information coming from a sequencing
> project, and I need a solution to the whole 'assembly life cycle
> management' problem.
It's a good start, but it's not the only solution (by far). If you
want to integrate in more information you could look into Chado
(Apollo has a plugin for Chado).
> Thanks for any help,
More information about the Bioperl-l