[Bioperl-l] Entrez Gene ASN

Hilmar Lapp hlapp at gmx.net
Fri Mar 11 13:17:23 EST 2005

Gene shouldn't be fundamentally different from LocusLink, and LocusLink 
was represented as an annotated SeqI within bioperl.

If at all possible I'd still like it to remain that way for Gene in 
order to allow for a smooth transition from LL to Gene for code that's 
been using the former.

If you want to emphasize the fact that it's a container for sequences, 
then that sounds like a ClusterI to me, which can be richly annotated 

Note also that NCBI is working on an ASN.1->XML converter. Personally, 
I'm inclined to wait for that converter to appear, but other priorities 
may prevail.

Let me know what you think.


On Thursday, March 10, 2005, at 06:14  AM, Stefan Kirov wrote:

> Hi guys!
> I have done some (mostly) serious thinking about ASN Entrez Gene 
> parsing and I propose we do my favorite thing- postpone everything we 
> cannot deal with right now. If you want it to sound better: take a 
> gradual approach where we store the data we can deal with in the 
> existing Bioperl objects and skipping the rest for now.
> In details:
> ASN gene record can be correctly represented as a tree. I have written 
> a simple parser for my own purposes which is storing the following:
> node_id---|
>                  --parent
>                  --level
>                  --tag
>                  --values
> What I do then is get specific levels and tags and build different 
> objects. So level 2 with parent EntrezGene (which is the root level 
> and has no information) is gene description and has tags such as gene, 
> name, etc; at level 3, 5 and 6 you can get the complete specie 
> definition by looking for orgname and org as tags and records with 
> parent mod (which is a value for orgname, descend down the branch).
> I am using this approach to store most of the data in a relational 
> database without going through Bioperl. What I ultimately want to do 
> is use standard Bioperl modules. However, I don't think we have an 
> object that can efficiently represent the structure (correct me if I 
> am wrong). I think it may be a good idea to have a container object, 
> possibly Bio::Gene that may contain multiple Bio::Seq objects (with or 
> without real sequence). I believe we can borrow some structure and 
> code from EnsEMBL gene representation (way to contain multiple 
> transcripts, etc., not the database interactions certainly).
> Please let me know what you think.
> Stefan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757

More information about the Bioperl-l mailing list