[Bioperl-l] Homologene parser?
basu at pharm.sunysb.edu
Tue Aug 14 11:02:06 EDT 2007
neeti somaiya wrote:
> Hi Andrew,
> I think the homologene data files have changed now on the ftp, from what you
> had used.
> It is now homologene.data and homologene.xml.
> I tried using your parser, but because it was written on the file
> hmlg.trip.ftp, it doesnt work anymore.
> I came across a parser
> I am looking at it to see if it works for me. NOt sure if it will.
I have recently written a parser for 'homologene' xml data specific for
my purpose. I am not sure whether it will suit your purpose but it could
be extended for general purpose parsing, so i am putting it forward.
Here is how it works .......
* It only parses a single homologene entry <HG-Entry>.....</HG-Entry>.
* It does SAX based parsing (currently uses XML::SAX::ExpatXS)
* Returns a graph(uses Graph module of perl) object where each node is a
homologue entry with its corresponding entrez gene id. Each node also
contain the following attributes ...
* Refseq protein id.
* Protein id (pid)
* ncbi taxon id.
* The edge attribute contain information about the ortholog(true/false)
relationship between two nodes.
* The rest of tags currently are not being extracted. However, parsing
the rest of the tags should not be very difficult.
Generally i get homologene xml stream from an 'efetch' through
Bio::DB::EUtilities, feed it to the parser, gets back 'Graph' object and
then works on it.
So, to make it more generic and work on local file
* We need another class that reads the chunk between
<HG-Entry>.....</HG-Entry> and sends it to the parser.
* Add supports for most of the tags.
* Massage the data to a bioperl compatible object.
The first two i could work it out and for the last one i have to figure
out the bioperl object that could be suitable (like Bio::Cluster or
Let me know if it sounds interesting and i will send you the code.
> On 8/14/07, Andrew Macgregor <amacgregor at ccg.murdoch.edu.au> wrote:
>> On 13/08/2007, at 6:29 PM, neeti somaiya wrote:
>>> Does anyone know of any Homologene parser, if available?
>>> Please let me know.
>>> Thanks and Regards,
>> Hi Neeti,
>> Quite a long time ago now I wrote an Homologene parser and posted it
>> to the mailing list:
>> I don't know if this still works but you could use it as a starting
>> point. There may also be something newer out there too, I don't know.
>> If you search the mailing list archives you'll get a few messages
>> around the topic.
>> Cheers, Andrew.
>> Andrew Macgregor
>> Centre for Comparative Genomics, Murdoch University
>> Email: amacgregor at ccg.murdoch.edu.au
>> Tel: (08) 9360 2961
More information about the Bioperl-l