[Bioperl-l] New GO Parser and errors loading biosql database

Hilmar Lapp hlapp at gmx.net
Fri Feb 20 03:37:17 EST 2004

On Thursday, February 19, 2004, at 12:50  PM, Law, Annie wrote:

>  However, many of the entries are not able to be
> inserted (roughly 200).
> Mostly complaining about how the column name cannot be null.  However,  
> I'm
> not sure if it is related to
> The make test errors I am having with bioperl-db that I have listed  
> below or
> if this is an acceptable result.
> In general how should a user gauge how successful a load of the  
> database
> was?  I guess you can sort
> of look at the total number of expected number entries.

It's always a good idea to look over the errors and check whether there  
are any that just don't make sense. The one below is an example:

> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values  
> were
> ("BBD_pathwayID:C1cyc","","","") FKs (2)
> Column 'name' cannot be null

'BBD_pathwayID:C1cyc' is *not* a GO term (all GO terms have identifiers  
that start with GO:). It's in fact a dbxref of a term that erroneously  
ends up as a term because in the 1.4 release of bioperl a bug had been  
introduced into the dagflat parser (which the GO parser basically is  
identical to). I strongly recommend you upgrade at a minimum the module  
Bio/OntologyIO/dagflat.pm with the one from cvs (tag branch-1-4).  
Alternatively, update the entire bioperl distribution from cvs (again,  
use branch-1-4).

Doing so will get rid of most if not all of the errors.

Generally speaking, there should be no or only a few terms that fail to  
load, and if any fail then they should only fail because of column  
width constraints or something similar.

> 2) I have a question about The make test bioperl-db results which may  
> be
> related to the results that I am getting. I seem to be having problems  
> with
> the make test for bioperl-db.  I downloaded the tarball from the CVS  
> website
> and installed it.
> I looked at the documentation and I created User biosql which has been  
> given
> all the permissions it needs.  I also renamed the files as stated in  
> the
> steps below. In the t directory of bioperl-db $ cd t $ cp
> DBHarness.conf.example DBHarness.biosql.conf $ cp  
> DBHarness.conf.example
> DBHarness.markerdb.conf

You do not need to create DBHarness.markerdb.conf anymore. It's not  

> I also put a copy of those file in the bioperl-db in the home directory
> since that was documented for the newest version Of bioperl-db.

Not sure where you found that. The only place where this file needs to  
reside is in the t/ directory.

> I did a make test in the bioperl-db directory and go the following  
> results.
> Most of the tests seem to fail. I am not sure why.

Generally speaking, just read the error message. It often says why, and  
so does it here.

> [root at microarray bioperl-db]# maket test
> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e"
> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
> t/cluster.......install_driver(mysql) failed: Can't load
> '/usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/auto/DBD/ 
> mysql/mys
> ql.so' for module DBD::mysql:
> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi/auto/DBD/ 
> mysql/mysq
> l.so: undefined symbol: mysql_ssl_set at
> /usr/lib/perl5/5.8.0/i386-linux-thread-multi/DynaLoader.pm line 229.   
> at
> (eval 4) line 3 Compilation failed in require at (eval 4) line 3.  
> Perhaps a
> required shared library or dll isn't installed where expected  at
> t/DBTestHarness.pm line 211

This says that your DBI driver could not be loaded. It has nothing to  
do with bioperl-db. You have either not or not successfully installed  
the mysql DBI driver, or you have installed it at a non-standard  
location, or you have installed it under another version of perl.

Make sure the tests for the DBD::mysql module pass before trying to use  
the driver.

Obviously, if the DBI driver can't be loaded, none of the tests will  
succeed, as then no database connection can be opened.

> 3) Previously when I did a make test for the Bioperl 1.4 installation  
> most
> of the tests passed 97% I'm not sure whether the errors are expected  
> or not

Generally, *all* tests of a stable bioperl distribution (which 1.4 is)  
are supposed to pass. If one or more don't, then chances are high that  
something is wrong.

> Here are the results of the make test.  I only cut out the beginning  
> of the
> test and the summary at the end. Installation of bioperl
> ------------- EXCEPTION  -------------
> MSG: Failed to load module Bio::SeqIO::game. Can't locate IO/String.pm

The message pretty much says it all. Bioperl does depend at a lot of  
places on IO::String, so I'd strongly recommend you go ahead and  
install it.

> 4) Also, hopefully when I get this all running I would like to know  
> what is
> the best order for loading the database. I know you mentionned that  
> the GO
> database information should be loaded before the locuslink  
> information. Here
> is the list of proposed order of entering information into the  
> database.
> Can you use load_seqdatabase.pl for loading unigene information?

Yes you can. Make sure you read the POD of load_seqdatabase.pl to see  

> 1.  load NCBI taxonomy database with load_ncbi_taxonomy.pl
> 2.  GO information

The only things for which order matters are those which are referenced,  
but provided only in an incomplete manner, by annotated data sources.  
Hence, species information and any ontology that your data source uses  
for annotation should be loaded in advance so that upon loading of the  
annotated sequences the referenced entities are found by look-up.

> 3.  load locuslink database information
> 4.  unigene information which I also had problems with loading  
> information
> in
> [root@ bioperl-1.4]#perl  
> /root/bioperl-db/scripts/biosql/load_seqdatabase.pl
> --dbuser=root --dbpass=ms22 --dbname bioseqdb
> --namespace "Unigene" -format unigene  
> /root/bioperl--1.4/unigenedata/Hs.data
> Loading /root/bioperl-1.4/unigenedata/Hs.data ...
> Bio::SeqIO: unigene cannot be found
> Exception
> ------------- EXCEPTION  -------------
> MSG: Failed to load module Bio::SeqIO::unigene. Can't locate
> Bio/SeqIO/unigene.pm in @INC (@INC contains:

The message pretty much says it. The indicated module, which is the  
bioperl unigene parser, fails to load. The reason is most likely that  
you didn't install bioperl, or installed in a location that is not in  
Perl's default search path. If the latter is the case, you need to  
setup the PERL5LIB environment variable prior to running any code that  
uses those modules.


Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757

More information about the Bioperl-l mailing list