[Bioperl-l] Extract features from GFF

Sean Davis sdavis2 at mail.nih.gov
Tue Oct 23 07:00:12 EDT 2007


Chris Fields wrote:
> On Oct 22, 2007, at 5:30 PM, Hang wrote:
> 
>> Hello,
>>
>> I have a list of about 100,000 short genomic regions with paired  
>> start and end
>> coordinations on reference fly genome (R5.3). I also have GFF files  
>> from the
>> same genome release. I wonder how I can extract all overlapping  
>> features from
>> these regions.
>>
>> For example:
>>
>> region A is on chromosome 2L between 123,456 bp to 123,489 bp. What  
>> code should
>> I use to extract feature, like gene, CDS etc., that overlaps with  
>> this region?
>>
>> Thank you in advance!
>>
>> -- Hang
> 
> Look into using Bio::DB::GFF or Bio::DB::SeqFeature::Store; this will  
> depend on the GFF version of the data you have.
> 
> chris

Not a bioperl solution, but still a very cool application, see:

http://main.g2.bx.psu.edu/

It is web-based and allows for set operations (intersection, union,
etc.) on genomic coordinates.  If you have two sets of regions of
interest, you can then ask (very quickly) for all regions that overlap,
contain, are distinct, etc.

Sean


More information about the Bioperl-l mailing list