[Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
biopython at maubp.freeserve.co.uk
Tue Jan 12 11:19:32 EST 2010
On Tue, Jan 12, 2010 at 4:02 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Jan 11, 2010, at 9:55 AM, Peter wrote:
>> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>>> These entries form the CON data class, see:
>>> and they don't contain any sequence information.
>> I know - GenBank files have a similar system with CONTIG
>> lines instead of sequences. I was expecting BioPerl to be
>> able to convert these EMBL files with CO lines into GenBank
>> files with CONTIG lines.
> IIRC the contig information for GenBank is stored in annotation.
> We can try to ensure the data is carried over to EMBL properly.
For contig records (where there is no sequence) I think we just
need to map the GenBank CONTIG lines to the EMBL CO lines,
and vice versa. At least, that's what Biopython now does (trunk
code, not yet released).
>>> If you take the 'expanded' entries from
>>> your script will work.
>> That's a useful tip - thanks.
> NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).
Indeed. This is a useful work around for when a parser couldn't
cope with the contig version of a GenBank file for some reason, e.g.
More information about the Bioperl-l