[Bioperl-l] Indexing large databases / BioSQL

Chris Fields cjfields at uiuc.edu
Mon Apr 28 12:24:39 EDT 2008

On Apr 28, 2008, at 8:51 AM, Bánk Beszteri wrote:

> Chris Fields schrieb:
>> ...
>> You should use 'swiss' format instead of 'embl' when loading  
>> Uniprot/SwissProt sequences.  Though on the surface they're similar  
>> the feature table (among other things) is completely different.   
>> I'm not sure if that's causing all of the issues here but it  
>> certainly could contribute to them.
>> In the meantime, it's much easier for us to track these problems if  
>> you file a bug (BioPerl, file for bioperl-db):
>> http://bugzilla.open-bio.org/
> Hi Chris,
> I will do so; in the meanwhile: I´m not loading Swissprot, but  
> TrEMBL. Is swiss also the appropriate format here? By reading http://expasy.org/sprot/userman.html#diffEMBL 
> , I concluded that embl should be the one I´d need for TrEMBL.
> Bank

The section you link to describes several important differences  
between EMBL and SwissProt/UniProt format (i.e. how each indicated  
line type differs between SwissProt and EMBL formats, including ID,  
AC, OS/OC, FT, etc).  I'm unsure how you derived that 'embl' would  
work from that, e.g. they are close, but there are enough significant  
differences that using 'embl' for SwissProt (or vice versa) will not  
work as intended, if at all.


More information about the Bioperl-l mailing list