[Bioperl-l] Validation of information loaded into bioperl-db

Hilmar Lapp hlapp at gmx.net
Tue Jul 20 11:29:19 EDT 2004

On Monday, July 19, 2004, at 08:38  AM, Law, Annie wrote:

> 1. When you use the load_seqdatabase.pl script how can you check if 
> the load
> has been successful. I would like to automate this process.  I plan to 
> run a
> cron job that would load the datbase but would Like to know an 
> efficient
> method to see if the load has been successful.

I have every update job write a log file and look at them manually, 
believe it or not. Almost all jobs do succeed, and you can normally 
tell from the size of the log file whether or not something went wrong. 
I've found all real problems so far to happen fast or to not come 
alone, unlike the occasional entry with a species parsing problem or 
so. So, if the log is 10x shorter or 10x longer than usual I would look 
at it and investigate.

Sorry, no script that does this.

Also, you could set up a unit test that tests a certain entry. This is 
difficult though due to the volatile nature of the data sources.

>   You can create log files to
> read or script the runs but is there something else that can be done.
> 2. I know that there are scriplets in the same directory as
> load_seqdatbase.pl that can be used in conjunction
> With load_seqdatbase.pl when you use the options lookup and mergeobjs. 
>  I
> would like to know if the same script can be Used for the 
> load_ontology.pl
> script.

No they can't, because they work on Bio::SeqI objects, not 
Bio::Ontology::TermI objects.

I myself don't merge old and new terms, I just update them (i.e., 
--lookup). Terms don't really have a lot of annotation in associated 
tables, and bioperl-db fully deals with the synonyms.

> 3. In both load_seqdatabase.pl and load_ontology.pl there is the option
> --remove. I want to remove all old information and refresh with new 
> data.
> Do I use --remove in conjunction with
> --lookup and --mergobjs with freshen-annot.pl.   I don't understand 
> the need
> for the --remove option if you are
> Already using --lookup and --mergobjs with freshen-annot.pl

You either remove or merge old objects, not both at the same time. 
Also, I wouldn't abuse --mergeobjs for a script that removes the old 
object (although you could do that) because it will be slower.

>   It seems that
> this would be redundant but perhaps there is something
> I am missing.
> 4. What is the default behavior if I don't use the options such as 
> lookup
> and mergeobjs?  Will all the data just be overwritten When I use
> load_ontology.pl and load_seqdatabase.pl?

If you don't use --lookup (and without it --mergeobjs will have no 
effect because then there can't be a found object either) all entries 
will be inserted. Those that exist already as determined by their 
alternative key (accession, version, namespace for bioentries) fill 
fail to insert and hence will remain in the database as they were 



> Thanks very much,
> Annie.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757

More information about the Bioperl-l mailing list