[Bioperl-l] Newbie Questions: bioperl, bioperl-db, and GO

Jamie Sherman jjmail at mac.com
Thu Apr 14 02:44:17 EDT 2005

>> I have a large list of protein names and I would like to use bioperl 
>> to get the corresponding Gene Ontology (GO) information for each 
>> protein.
>> So far I have installed bioperl, BioSQL, and bioperl-db and uploaded 
>> the taxonomy and GO information into BioSQL. I am having a really 
>> hard time figuring out how to get the GO information out of the 
>> database.
> I'm not sure I understand what you're trying to do. If you loaded the 
> GO ontology into biosql then that will not give you term to protein 
> associations. You would need to load the proteins as well and have 
> annotation for them that references the GO terms they are associated 
> with. There is also a level of devil in the details because currently 
> no bioperl SeqIO parser except the LocusLink parser (and hopefully the 
> Entrez Gene parser already or soon) will give you the GO term 
> associations as appropriate Bio::Annotation::OntologyTerm annotation.

This helps me a lot. I realized I had to get the GO terms associated 
with the protein and I had seen them in the swissprot annotations so I 
figured I'd have to parse them out of the annotations but I'll try 
LocusLink and Entrez Gene and see if that makes it easier. I thought 
that the GO information in BioSQL might contain associated gene lists 
too but apparently not.

> If you want to use UniProt as the protein data source then the terms 
> would end up as dbxrefs. I can post a simple SQL script that will 
> convert those into term associations, but the point is that this won't 
> happen magically.
> If all you're trying to do is lookup ontology terms based on some 
> identifying property, like the identifier, then you can do this in 
> bioperl-db using the same mechanism as for sequences:
> 	my $db = Bio::DB::BioDB->new(...blah...);
> 	my $term = Bio::Ontology::Term->new(-identifier => 'GO:123456');
> 	my $adp = $db->get_object_adaptor($term);
> 	my $dbterm = $adp->find_by_unique_key($term);
> 	# on success $dbterm is-a persistent Bio::Ontology::TermI
> If none of this helps you will need to be more specific on your 
> approach and what you want to achieve.
> 	-hilmar

I think using one of the approaches you outline I should be able to get 
the GO information so thanks a bunch.

The second part of the question is more about after I can collect all 
the information into the program and associate it with the protein 
expression data what is the best way to manage that information to take 
advantage of clustering abilities of bioperl. Should I load them into 
BioSQL and if so where do I look for documentation to learn the 
interface to BioSQL. I noticed a lot of the perldoc pages in the 
Bio::DB:*** seemed to be fairly sparse.

	Thanks Again,

More information about the Bioperl-l mailing list