[Bioperl-l] EMBL format field

Hilmar Lapp hlapp at gmx.net
Tue Jun 10 22:09:13 EDT 2008


Bill,

this mailing list is about BioPerl. There are many programs and web- 
sites out there that convert between IDs, that wasn't the question.

We welcome your participation in helping to solve Bioperl-related  
problems, and sometimes the easiest solution is to use other, cross- 
platform open-source tools.

For peddling commercial products, no matter how useful they are and  
how little the cost, please use other forums.

	-hilmar

On Jun 10, 2008, at 9:43 PM, bill at genenformics.com wrote:
> This can be accomplished using IdConvert if protein accession/gi is  
> known:
>
> $> ./IdConvert.exe BAA19060
> #Input  Nuc_GI  Nuc_Acc Pro_GI  Pro_Acc Desc
> BAA19060        1783121 AB000170.1      1783123 BAA19061.1
> endopeptidase 24.16 type M3 [Sus scrofa]
>
> Download IdConvert from http://www.genenformics.com/download.html  
> for free.
>
> Bill at genenformics.com
>
>
>>
>> On Jun 10, 2008, at 8:36 PM, Jason Stajich wrote:
>>> I agree if it isn't the accession # it shouldn't be stored there.
>>> I guess it is a DBlink, but it is going to be hacky to round-trip
>>> this as you'll have to have a special case for records that are
>>> mRNAs...
>>
>> I think I agree with that - didn't realize it is the accession of the
>> (translated) protein. It would be ideal to convert this into a DBLink
>> annotation indeed, but that's an opinion and an interpretation of the
>> file (even if a very useful one). As such I believe it should be the
>> matter of a SeqProcessor.
>>
>> Hmm - except that at that point the information has been lost already
>> so there's actually nothing that the SeqProcessor could massage.
>>
>> So what if the line would simply be a B::Annotation::SimpleValue with
>> 'PA' as key and the accession# as value? That wouldn't be an
>> interpretation, and yet would make the value available to a
>> SeqProcessor for converting into a DBLink.
>>
>> 	-hilmar
>>
>>>
>>> -jason
>>> On Jun 10, 2008, at 5:19 PM, Chris Fields wrote:
>>>
>>>> PA is an odd field; it isn't described in the EMBL user manual:
>>>>
>>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html
>>>>
>>>> but appears in mRNA files, so I'm guessing it stands for the (p)
>>>> rotein (a)ccession.  I don't think this should be stored as
>>>> primary/secondary accession, but maybe as a DBLink annootation?
>>>>
>>>> chris
>>>>
>>>> On Jun 10, 2008, at 6:57 PM, Jason Stajich wrote:
>>>>
>>>>> PA is a field that we don't currently parse, something that
>>>>> should be filed as a bug on bugzilla.
>>>>> Would you be able to do this?
>>>>>
>>>>> -jason
>>>>> On Jun 9, 2008, at 7:07 AM, Wen Huang wrote:
>>>>>
>>>>>> Hilmar,
>>>>>>
>>>>>> I tried that, it did not work. Marc's way can work.
>>>>>>
>>>>>> Thanks,
>>>>>> Wen
>>>>>>
>>>>>> On Jun 9, 2008, at 7:30 AM, Hilmar Lapp wrote:
>>>>>>
>>>>>>> If this is the case with the latest version of BioPerl it
>>>>>>> should be filed as a bug report for the embl parser. The ID
>>>>>>> ought to be reported in $seq->get_secondary_accessions() (which
>>>>>>> returns an array). If it doesn't, it sounds like a bug to me.
>>>>>>>
>>>>>>> 	-hilmar
>>>>>>>
>>>>>>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>>>>>>>> Hi Wen,
>>>>>>>> A dump of that sequence object (Data::Dumper is your friend !)
>>>>>>>> reveals
>>>>>>>> that the PA EMBL field is not saved into the object. However,
>>>>>>>> you will
>>>>>>>> find the string 'AB000170.1' in the embedded CDS feature, more
>>>>>>>> precisely
>>>>>>>> the seqid of the location object. I don't know whether that is
>>>>>>>> always
>>>>>>>> the case, but it is in your particular example.
>>>>>>>> So, to get your hands on that value you have to do:
>>>>>>>>
>>>>>>>> my ($cds) = grep {$_->primary_tag eq 'CDS'} $seq-
>>>>>>>>> get_SeqFeatures;
>>>>>>>> my $parent_id = $cds->location->seq_id;
>>>>>>>>
>>>>>>>> HTH,
>>>>>>>> Marc
>>>>>>>>
>>>>>>>> Marc Logghe
>>>>>>>> Senior Bioinformatician
>>>>>>>> Ablynx nv
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>>>>>>>>> Sent: Monday, June 09, 2008 5:28 AM
>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>> Subject: [Bioperl-l] EMBL format field
>>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have a EMBL file that I want to extract one of the line
>>>>>>>>>
>>>>>>>>> ###file###
>>>>>>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>>>>>>>>> XX
>>>>>>>>> PA   AB000170.1
>>>>>>>>> XX
>>>>>>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>>>>>>>>> XX
>>>>>>>>> OS   Sus scrofa (pig)
>>>>>>>>> OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
>>>>>>>>> Euteleostomi;
>>>>>>>>> Mammalia;
>>>>>>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina;
>>>>>>>>> Suidae; Sus.
>>>>>>>>> OX   NCBI_TaxID=9823;
>>>>>>>>> .........
>>>>>>>>>
>>>>>>>>> I want the accession number in the line that starts with PA,
>>>>>>>>> AB000170
>>>>>>>>> in this example.
>>>>>>>>>
>>>>>>>>> Can anybody kindly help, tell me which module and method I
>>>>>>>>> should use?
>>>>>>>>> I tried various things like $seq_obj -> primary_id,  
>>>>>>>>> display_id,
>>>>>>>>> get_secondary_id, etc.. they did not work...
>>>>>>>>>
>>>>>>>>> Thanks a lot!
>>>>>>>>>
>>>>>>>>> Wen
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>> --
>>>>>>> ===========================================================
>>>>>>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>>>>>>> ===========================================================
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Marie-Claude Hofmann
>>>> College of Veterinary Medicine
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================





More information about the Bioperl-l mailing list