[Bioperl-l] EMBL format field

Jason Stajich jason at bioperl.org
Tue Jun 10 20:36:20 EDT 2008


I agree if it isn't the accession # it shouldn't be stored there.  I  
guess it is a DBlink, but it is going to be hacky to round-trip this  
as you'll have to have a special case for records that are mRNAs...

-jason
On Jun 10, 2008, at 5:19 PM, Chris Fields wrote:

> PA is an odd field; it isn't described in the EMBL user manual:
>
> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html
>
> but appears in mRNA files, so I'm guessing it stands for the (p) 
> rotein (a)ccession.  I don't think this should be stored as primary/ 
> secondary accession, but maybe as a DBLink annootation?
>
> chris
>
> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote:
>
>> PA is a field that we don't currently parse, something that should  
>> be filed as a bug on bugzilla.
>> Would you be able to do this?
>>
>> -jason
>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote:
>>
>>> Hilmar,
>>>
>>> I tried that, it did not work. Marc's way can work.
>>>
>>> Thanks,
>>> Wen
>>>
>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote:
>>>
>>>> If this is the case with the latest version of BioPerl it should  
>>>> be filed as a bug report for the embl parser. The ID ought to be  
>>>> reported in $seq->get_secondary_accessions() (which returns an  
>>>> array). If it doesn't, it sounds like a bug to me.
>>>>
>>>> 	-hilmar
>>>>
>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>>> Hi Wen,
>>>>> A dump of that sequence object (Data::Dumper is your friend !)  
>>>>> reveals
>>>>> that the PA EMBL field is not saved into the object. However,  
>>>>> you will
>>>>> find the string 'AB000170.1' in the embedded CDS feature, more  
>>>>> precisely
>>>>> the seqid of the location object. I don't know whether that is  
>>>>> always
>>>>> the case, but it is in your particular example.
>>>>> So, to get your hands on that value you have to do:
>>>>>
>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq->get_SeqFeatures;
>>>>> my $parent_id = $cds->location->seq_id;
>>>>>
>>>>> HTH,
>>>>> Marc
>>>>>
>>>>> Marc Logghe
>>>>> Senior Bioinformatician
>>>>> Ablynx nv
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I have a EMBL file that I want to extract one of the line
>>>>>>
>>>>>> ###file###
>>>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>>> XX
>>>>>> PA   AB000170.1
>>>>>> XX
>>>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>>> XX
>>>>>> OS   Sus scrofa (pig)
>>>>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;  
>>>>>> Euteleostomi;
>>>>>> Mammalia;
>>>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina; Suidae;  
>>>>>> Sus.
>>>>>> OX   NCBI_TaxID=9823;
>>>>>> .........
>>>>>>
>>>>>> I want the accession number in the line that starts with PA,  
>>>>>> AB000170
>>>>>> in this example.
>>>>>>
>>>>>> Can anybody kindly help, tell me which module and method I  
>>>>>> should use?
>>>>>> I tried various things like $seq_obj -> primary_id, display_id,
>>>>>> get_secondary_id, etc.. they did not work...
>>>>>>
>>>>>> Thanks a lot!
>>>>>>
>>>>>> Wen
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> -- 
>>>> ===========================================================
>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>> ===========================================================
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>



More information about the Bioperl-l mailing list