[Bioperl-l] EMBL format field

Hilmar Lapp hlapp at gmx.net
Tue Jun 10 21:35:50 EDT 2008


On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote:
> I agree if it isn't the accession # it shouldn't be stored there.   
> I guess it is a DBlink, but it is going to be hacky to round-trip  
> this as you'll have to have a special case for records that are  
> mRNAs...

I think I agree with that - didn't realize it is the accession of the  
(translated) protein. It would be ideal to convert this into a DBLink  
annotation indeed, but that's an opinion and an interpretation of the  
file (even if a very useful one). As such I believe it should be the  
matter of a SeqProcessor.

Hmm - except that at that point the information has been lost already  
so there's actually nothing that the SeqProcessor could massage.

So what if the line would simply be a B::Annotation::SimpleValue with  
'PA' as key and the accession# as value? That wouldn't be an  
interpretation, and yet would make the value available to a  
SeqProcessor for converting into a DBLink.

	-hilmar

>
> -jason
> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote:
>
>> PA is an odd field; it isn't described in the EMBL user manual:
>>
>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html
>>
>> but appears in mRNA files, so I'm guessing it stands for the (p) 
>> rotein (a)ccession.  I don't think this should be stored as  
>> primary/secondary accession, but maybe as a DBLink annootation?
>>
>> chris
>>
>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote:
>>
>>> PA is a field that we don't currently parse, something that  
>>> should be filed as a bug on bugzilla.
>>> Would you be able to do this?
>>>
>>> -jason
>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote:
>>>
>>>> Hilmar,
>>>>
>>>> I tried that, it did not work. Marc's way can work.
>>>>
>>>> Thanks,
>>>> Wen
>>>>
>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote:
>>>>
>>>>> If this is the case with the latest version of BioPerl it  
>>>>> should be filed as a bug report for the embl parser. The ID  
>>>>> ought to be reported in $seq->get_secondary_accessions() (which  
>>>>> returns an array). If it doesn't, it sounds like a bug to me.
>>>>>
>>>>> 	-hilmar
>>>>>
>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>>>> Hi Wen,
>>>>>> A dump of that sequence object (Data::Dumper is your friend !)  
>>>>>> reveals
>>>>>> that the PA EMBL field is not saved into the object. However,  
>>>>>> you will
>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more  
>>>>>> precisely
>>>>>> the seqid of the location object. I don't know whether that is  
>>>>>> always
>>>>>> the case, but it is in your particular example.
>>>>>> So, to get your hands on that value you have to do:
>>>>>>
>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq- 
>>>>>> >get_SeqFeatures;
>>>>>> my $parent_id = $cds->location->seq_id;
>>>>>>
>>>>>> HTH,
>>>>>> Marc
>>>>>>
>>>>>> Marc Logghe
>>>>>> Senior Bioinformatician
>>>>>> Ablynx nv
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a EMBL file that I want to extract one of the line
>>>>>>>
>>>>>>> ###file###
>>>>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>>>> XX
>>>>>>> PA   AB000170.1
>>>>>>> XX
>>>>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>>>> XX
>>>>>>> OS   Sus scrofa (pig)
>>>>>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;  
>>>>>>> Euteleostomi;
>>>>>>> Mammalia;
>>>>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina;  
>>>>>>> Suidae; Sus.
>>>>>>> OX   NCBI_TaxID=9823;
>>>>>>> .........
>>>>>>>
>>>>>>> I want the accession number in the line that starts with PA,  
>>>>>>> AB000170
>>>>>>> in this example.
>>>>>>>
>>>>>>> Can anybody kindly help, tell me which module and method I  
>>>>>>> should use?
>>>>>>> I tried various things like $seq_obj -> primary_id, display_id,
>>>>>>> get_secondary_id, etc.. they did not work...
>>>>>>>
>>>>>>> Thanks a lot!
>>>>>>>
>>>>>>> Wen
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>> -- 
>>>>> ===========================================================
>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>> ===========================================================
>>>>>
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Marie-Claude Hofmann
>> College of Veterinary Medicine
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================





More information about the Bioperl-l mailing list