[Bioperl-l] Aggressive aggregation?

Chad Matsalla chad at dieselwurks.com
Tue Mar 8 22:15:07 EST 2005

Subject: Aggressive Aggregators

Greetings all,

I'm looking for help in presenting Blast hits in GBrowse.

I blasted Brassica EST sequences against the Arabidopsis
pseudochromosome assemblies in order to store them in a Bio::DB::GFF
database. I used a tool based bp_search2gff.pl to `convert' blast
reports into gff. A sample of that gff is below[1].

My problem is partly based on a peculiarity of Blast and partly based on
the behavior of the aggregators in GBrowse and I'm wondering if someone
else has seen this.

Arabidopsis has five chromosomes. In order to get the coordinates
necessary to place ESTs on the chromosomes I created a blast database
containing 5 query sequences - chr1, chr2, chr3, chr4, chr5.

My problem presents itself when an EST hits at more than once place on a
Chromosome.  Let us say that on chr1 there is a cluster of HSPs for the
est chad1 at position 1000, a second cluster at position 10,000 and a
third cluster at 50,000. Blast will indicate a SINGLE hit on chr1.

SO, I manually find clusters of HSPs and create GFF that resembles that
below[1]. Yes I know that wublast has an option to prevent that

The problem is that the `match' aggregator joins all of the `matches'
together.  I understand that it's because all of the matches have the
same Target - that's necessary to have the proper sequence appear while
viewing base-base alignments.

HSPs:        <-->  <-->  <-->                 <-->  <-->  <-->
matches:     <-------------->                 <-------------->

What I get : <-->--<-->--<-->-----------------<-->--<-->--<-->
What I want: <-->--<-->--<-->                 <-->--<-->--<-->

How do I get what I want? In my gbrowse.conf I tried the standard
`match' aggregator and a custom aggregator: csmmatch{csmhsp/csmmatch}

Chad Matsalla

chr1 aafcest     HSP   1     75    .     +     .     Target "Sequence:chad1" 1 75
chr1 aafcest     HSP   100   150   .     +     .     Target "Sequence:chad1" 100 150
chr1 aafcest     match 1     150   .     +     .     Target "Sequence:chad1" 1 150

chr1 aafcest     HSP   200   275   .     -     .     Target "Sequence:chad1" 200 275
chr1 aafcest     HSP   300   450   .     -     .     Target "Sequence:chad1" 300 450
chr1 aafcest     match 200   450   .     -     .     Target "Sequence:chad1" 200 450

More information about the Bioperl-l mailing list