[Bioperl-l] Indexing large databases / BioSQL
cjfields at uiuc.edu
Mon Apr 28 12:24:39 EDT 2008
On Apr 28, 2008, at 8:51 AM, Bánk Beszteri wrote:
> Chris Fields schrieb:
>> You should use 'swiss' format instead of 'embl' when loading
>> Uniprot/SwissProt sequences. Though on the surface they're similar
>> the feature table (among other things) is completely different.
>> I'm not sure if that's causing all of the issues here but it
>> certainly could contribute to them.
>> In the meantime, it's much easier for us to track these problems if
>> you file a bug (BioPerl, file for bioperl-db):
> Hi Chris,
> I will do so; in the meanwhile: I´m not loading Swissprot, but
> TrEMBL. Is swiss also the appropriate format here? By reading http://expasy.org/sprot/userman.html#diffEMBL
> , I concluded that embl should be the one I´d need for TrEMBL.
The section you link to describes several important differences
between EMBL and SwissProt/UniProt format (i.e. how each indicated
line type differs between SwissProt and EMBL formats, including ID,
AC, OS/OC, FT, etc). I'm unsure how you derived that 'embl' would
work from that, e.g. they are close, but there are enough significant
differences that using 'embl' for SwissProt (or vice versa) will not
work as intended, if at all.
More information about the Bioperl-l