[Bioperl-l] BioSQL load_seqdatabase.pl -pipeline option

Hilmar Lapp hlapp at gmx.net
Fri Nov 3 17:54:08 EST 2006

Close. It's not the --pipeline option you want to use for this  
purpose but the --seqfilter option.

For example, to retain only sequence with taxon id 9606 you would say

	--seqfilter 'sub {my $s=shift->{"-species"}; return 1 unless $s;  
return 1 unless $s->ncbi_taxid; return 1 if $s->ncbi_taxid == 9606;  
return 0;}'

Note that when formulating the conditions upon which to accept or  
reject the object you need to take into account that the closure may  
be called multiple times for one object, at various stages of  
completion of the properties hash. So, the above sequence of logic  
says, accept the object if there is no species attached (yet), or if  
the species doesn't have a taxon ID (yet; in Genbank format, the  
taxon ID is actually in the feature table, and hence will only be  
populated later, after parsing the organism lines), or if the taxon  
ID is 9606. Otherwise (i.e., there is a species object, it has a  
taxon ID defined, and the taxon ID is not 9606) reject the object.

(Note that --seqfilter will read and parse a file if the argument  
refers to an existing and readable file. So if you are going to use  
this construct often, you may want to put into a file.)


On Nov 3, 2006, at 11:47 AM, Seth Johnson wrote:

> Hello guys,
> I'm populating biosql database using "load_seqdatabase.pl" from
> genbank release files for primates.  However, I only need sequences
> that belong to humans (taxon id: 9606).  I assume that best way to
> filter the necessary sequences is to use '-pipeline' option of the
> script.  The documentation seems a little vague to me on how to create
> my own processor to accomplish the task.  Can anyone clarify the
> steps???
> -- 
> Best Regards,
> Seth Johnson
> Senior Bioinformatics Associate
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :

More information about the Bioperl-l mailing list