[Bioperl-l] Aggressive aggregation?
lstein at cshl.edu
Thu Mar 10 16:49:05 EST 2005
The problem is tied up with the need for better handling of GFF3 by
Bio::DB::GFF. In GFF3 you can separate the Name of a thing and its
ID=match0001;Target=cdna0123 12 462
ID=match0001;Target=cdna0123 463 963
ID=match0001;Target=cdna0123 964 2964
ID=match0002;Target=cdna0123 1 129
ID=match0002;Target=cdna0123 463 960
This is what the alignment GFF emitter should produce. Unfortunately,
when you load this into Bio::DB::GFF, the distinction between the ID
and the Target is lost and all the lines get aggregated together
again on the target name cdna0123.
I've got lots of notes on a better Bio::DB::GFF and a sample schema
and queries. If someone wants to work on this, I'll hand it over to
them. ...Alternatively, perhaps this can be fixed by a much less
invasive change to the Bio::DB::GFF module. Perhaps the Target
should simply be converted into an alias so that it can be
On Thursday 10 March 2005 12:21 pm, Chad Matsalla wrote:
> On Wed, 9 Mar 2005, Aaron J. Mackey wrote:
> > > chr1 aafcest HSP 200 275 . - . Target
> > > "Sequence:chad1" 200 275
> > > chr1 aafcest HSP 300 450 . - . Target
> > > "Sequence:chad1" 300 450
> > > chr1 aafcest match 200 450 . - . Target
> > > "Sequence:chad1" 200 450
> > These need to be Target "Sequence:chad1-1" and "Sequence:chad1-2"
> > or some such. This also means that if you're saving the ESTs in
> > the database (for sequence alignment display), you'll have to
> > save them redundantly under chad1-1, chad1-2, etc.
> This is horrible. I want to fix this.
> > Now, you could write a custom aggregator that de-aggregated
> > multiple chad1 "match" features, assigning the contained HSPs to
> > each, but there is no such "default" behavior. Let me know if
> > there's general interest for this ...
> I think there is, and I volunteer to write it. I'm new to the
> Bio::DB subsystem but I'm eager to dive in. Can you help me by
> providing a general flowchart on what you'd do to create this? What
> should the Aggregator be called? Hmm.
> Bio::DB::GFF::Aggregator::manymatch ?
> Chad Matsalla
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
NOTE: Please copy Sandra Michelsen <michelse at cshl.edu> on
all emails regarding scheduling and other time-critical topics.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20050310/0d70e8a1/attachment.bin
More information about the Bioperl-l