[Bioperl-l] EMBL format field

Zhi-Qiang Ye yezhiqiang at gmail.com
Tue Jun 10 07:43:50 EDT 2008


That's weird. I also met this problem. I tried a embl-format file like this:

ID   CB271253; SV 1; linear; mRNA; EST; INV; 591 BP.
XX
AC   CB271253;
XX
DT   24-FEB-2003 (Rel. 74, Created)
DT   24-FEB-2003 (Rel. 74, Last updated, Version 1)
XX
DE   taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA 3' similar to
DE   SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence.

from: http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw

the $seq object's   ->id, ->display_id  are "unkown id" ...



ZQ Ye

2008/6/9 Hilmar Lapp <hlapp at gmx.net>:
> If this is the case with the latest version of BioPerl it should be filed as
> a bug report for the embl parser. The ID ought to be reported in
> $seq->get_secondary_accessions() (which returns an array). If it doesn't, it
> sounds like a bug to me.
>
>        -hilmar
>
> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>
>> Hi Wen,
>> A dump of that sequence object (Data::Dumper is your friend !) reveals
>> that the PA EMBL field is not saved into the object. However, you will
>> find the string 'AB000170.1' in the embedded CDS feature, more precisely
>> the seqid of the location object. I don't know whether that is always
>> the case, but it is in your particular example.
>> So, to get your hands on that value you have to do:
>>
>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures;
>> my $parent_id = $cds->location->seq_id;
>>
>> HTH,
>> Marc
>>
>> Marc Logghe
>> Senior Bioinformatician
>> Ablynx nv
>>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>> Sent: Monday, June 09, 2008 5:28 AM
>>> To: bioperl-l at lists.open-bio.org
>>> Subject: [Bioperl-l] EMBL format field
>>>
>>> Hi all,
>>>
>>> I have a EMBL file that I want to extract one of the line
>>>
>>> ###file###
>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>> XX
>>> PA   AB000170.1
>>> XX
>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>> XX
>>> OS   Sus scrofa (pig)
>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
>>> Mammalia;
>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae; Sus.
>>> OX   NCBI_TaxID=9823;
>>> .........
>>>
>>> I want the accession number in the line that starts with PA, AB000170
>>> in this example.
>>>
>>> Can anybody kindly help, tell me which module and method I should use?
>>> I tried various things like $seq_obj -> primary_id, display_id,
>>> get_secondary_id, etc.. they did not work...
>>>
>>> Thanks a lot!
>>>
>>> Wen
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


More information about the Bioperl-l mailing list