[Bioperl-guts-l] [BioPerl - Bug #3159] (Resolved) scripts/index/bp_index.PLS does not support SwissPfam

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Sun Mar 27 18:00:05 EDT 2011


Issue #3159 has been updated by Jason Stajich.

Status changed from New to Resolved
Assignee set to Jason Stajich

I still think this is more appropriate for a custom script rather than the general purpose indexer/fetcher as there are no real objects to represent SwissPfam data -- we don't treat them as Seq objects.

I've hacked the indexer code to handle SwissPfam more. 

This seems to work for me now:
<pre>
$ wget ftp://ftp.sanger.ac.uk/pub/databases/Pfam/releases/Pfam14.0/swisspfam.gz
$ gunzip swisspfam.gz
$ export BIOPERL_INDEX="$HOME/bioperl_index" 
$ bp_index -v -f swisspfam /tmp/swisspfam 
$ bp_fetch -f swisspfam swisspfam:108_LYCES
</pre>
----------------------------------------
Bug #3159: scripts/index/bp_index.PLS does not support SwissPfam
http://redmine.open-bio.org/issues/3159

Author: Catalin Patulea
Status: Resolved
Priority: Normal
Assignee: Jason Stajich
Category: Core Components
Target version: 1.6 branch
URL: 


This is using Ubuntu package bioperl-1.6.1-2.

I am trying to index the swisspfam file from Pfam 14.0 (intentionally an older version). There appears to be a Bio::Index::SwissPfam module for parsing and indexing this file type, but bp_index.PLS script does not correctly handle it.

$ wget ftp://ftp.sanger.ac.uk/pub/databases/Pfam/releases/Pfam14.0/swisspfam.gz
$ gunzip swisspfam.gz
$ export BIOPERL_INDEX="$HOME/bioperl_index"
$ bp_index -v swisspfam swisspfam
Indexing file /home/catu/pfam/Pfam14.0/swisspfam
Adding key 104K_THEPA
Adding key 108_LYCES
Adding key 10KD_VIGUN
Adding key 11S3_HELAN
...

$ bp_fetch swisspfam:104K_THEPA
>104K_THEPA |=================================================| P15711 924 a.a.
DUF5294------------------(73)PF04385.4Domainofunknownfunctio
n,DUF52936-111149-224265-343379-456

Note how all the domain annotations are condensed into one line by removing whitespace; this is because bp_index actually used Bio::Index::Fasta to index instead of Bio::Index::SwissPfam. I tried adding in a hook for Bio::Index::SwissPfam in bp_index.PLS, but then bp_fetch bombs out about a missing get_Seq_by_id method.

The following wiki page mentions that bp_index/bp_fetch can be used with SwissPfam, though there are no specific examples:
http://www.bioperl.org/wiki/Module:Bio::Index::SwissPfam.

The perldoc page for Bio::Index::SwissPfam includes some sample code for using the module directly. I don't think the fetch example is correct, though. Right now it prints only the first line which starts with ">". At least for my purposes (I want the domain annotations), it would make sense to just print all lines up to the first blank line, which denotes the end of the current record.


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org



More information about the Bioperl-guts-l mailing list