[Bioperl-l] Species name validation problem

David Waner dwaner at scitegic.com
Mon Mar 27 13:24:12 EST 2006


Yes, I meant to type Bio::Species, not Bio::Seq. Sorry for the
confusion.

My problem is that I am not calling $species->classification() directly;
I am calling Bio::Species->new(), which in turn calls classification()
which calls validate_species_name(), which then throws an exception on
some species names.  As far as I can see, there is no way to turn off
this (over-aggressive) validation in the Species constructor. 

I guess that instead of this:

	$species = Bio::Species->new(-classification =>
\@classificationArray);

I could do this:

	$species = Bio::Species->new();
	$species->classification(\@classificationArray, 'no
validation');
	
but it would make a nicer interface to have a validation option in the
Species constructor.

- David

-----Original Message-----
From: Hilmar Lapp [mailto:hlapp at gmx.net] 
Sent: Friday, March 24, 2006 9:42 PM
To: David Waner
Cc: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Species name validation problem


The option would be in Bio::Species, not Bio::Seq. You can circumvent
the name validation by passing an array ref to
$species->classification() and anything that evaluates to true as the
second argument. This is for instance what the genbank parser does
(which doesn't mean that it is always correct); supposedly the swissprot
parser ought to do the same.

   -hilmar

On 3/24/06, David Waner <dwaner at scitegic.com> wrote:
> I have found that Bio::Seq->new() throws exceptions on some "species" 
> names containing special characters, or consisting of a single letter,
> e.g:
>
>         SwissProt: POLN_ONNVG   O'nyong-nyong virus
>         SwissProt: FIBP_ADE1H   Human adenovirus 15/H9
>         SwissProt: POLG_FMDVZ   Foot-and-mouth disease virus (strain
> A22/550 Azerbaijan 65)
>         SwissProt: RIR1_BHV1C   Bovine herpesvirus 1.1
>         SwissProt: SODF_METJ    Methylomonas J
>         GenBank: AJ416726               Stylosanthes aff. calcicola
>
> It seems that the regex in validate_species_name() is too restrictive,

> but I can't find a way to turn off validation without editing bioperl 
> modules.  There has been some recent discussion of this issue on the 
> mailing list (see below).  Does anyone know if or when a 
> -validate_species option to Bio::Seq->new() will be added? Or should I

> just propose the code change?
>
> Thanks,
>   David Waner
>
>
> > Stefan Kirov skirov at utk.edu
> > Wed Sep 21 08:46:05 EDT 2005
> >
> >
> ----------------------------------------------------------------------
> --
> --------
> >
> > Thanks for the great answer Hilmar!
> > I would prefer to have some kind of a check if the user wishes so. 
> > For
>
> > example Entrezgene file contains some HTML tags in some entries
> species
> > names which is good to know.
> > I will put an option -validate_species in the constructor to turn 
> > the check on and off. Maybe a species filter can be of some use as 
> > well. though you can just select the correct file from the NCBI 
> > site.... Thanks again! Stefan
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Click on the link below to report this email as spam
https://www.mailcontrol.com/sr/6RxreR3!4EAT093Sa0o+kL74sPfAD2rj2Jp!eGk8r
RtXfcIn+KX87A70BrDI0qIcMansH9FDdvd7u5Zc1G6CuaLdquPg4xnr+tcULmTIZgnhNIFUk
MNJWsODXSRTEtZF6To1umzAv!mlBBYJW4WXOZWaK8xzZrmj3Eao8o3D4YNM7jMpLnqnc7LtK
9D9H+YhmDk7r9DMVd5h6cTMU3rPx7Z43oVxeMeC 



More information about the Bioperl-l mailing list