[Bioperl-l] EMBL format field

Zhi-Qiang Ye yezhiqiang at gmail.com
Sat Jun 14 09:39:45 EDT 2008


   Thank all of you.  I finally get the newest version of bioperl
installed and solved the problem.

   I noticed that ensembl API still uses bioperl-1.2.3, which
misleaded me that bioperl-1.4 is very up-to-date ...


Regards,
Zhi-Qiang


2008/6/12 Kevin Brown <Kevin.M.Brown at asu.edu>:
> See the following links for where to get a more current version.  1.4 is
> years old and lots of parts are non-functional due to website and file
> format changes.
>
> http://www.bioperl.org/wiki/Installing_BioPerl
>
> http://www.bioperl.org/wiki/Installing_BioPerl_on_Ubuntu_Server
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>> Zhi-Qiang Ye
>> Sent: Thursday, June 12, 2008 2:07 AM
>> To: Jason Stajich
>> Cc: bioperl list
>> Subject: Re: [Bioperl-l] EMBL format field
>>
>> Hi, Jason
>>
>>      I used exactly your code, and the result is still 'unknown id'.
>> Where can I get the version of bioperl?
>> I used ubuntu gutsy, the version in ubuntu's package
>> management system is 1.4-1.
>>
>>      I installed BioPerl 1.4 on another computer, IA64 with redhat
>> linux.  It has the same problem.
>> In the process of installation using CPAN, make test always failed. So
>> I used 'force install ....'.
>> I am not sure it is the reason.
>>
>> Thanks.
>> Zhi-Qiang Ye
>>
>> 2008/6/11 Jason Stajich <jason at bioperl.org>:
>> > What version of bioperl? It works for me using  this code I
>> get 'CB271253'
>> > printed out.
>> >
>> > #!/usr/bin/perl -w
>> > use strict;
>> > use Bio::SeqIO;
>> > my $in = Bio::SeqIO->new(-format => 'embl', -file => shift);
>> > while( my $seq = $in->next_seq ) {
>> >  print $seq->id,"\n";
>> > }
>> >
>> > On Jun 10, 2008, at 4:43 AM, Zhi-Qiang Ye wrote:
>> >
>> >> That's weird. I also met this problem. I tried a
>> embl-format file like
>> >> this:
>> >>
>> >> ID   CB271253; SV 1; linear; mRNA; EST; INV; 591 BP.
>> >> XX
>> >> AC   CB271253;
>> >> XX
>> >> DT   24-FEB-2003 (Rel. 74, Created)
>> >> DT   24-FEB-2003 (Rel. 74, Last updated, Version 1)
>> >> XX
>> >> DE   taa17c02.x2 Hydra EST -II Hydra magnipapillata cDNA
>> 3' similar to
>> >> DE   SW:OPSD_RABIT P49912 RHODOPSIN. ;, mRNA sequence.
>> >>
>> >> from:
>> http://www.ebi.ac.uk/cgi-bin/dbfetch?db=embl&id=CB271253&style=raw
>> >>
>> >> the $seq object's   ->id, ->display_id  are "unkown id" ...
>> >>
>> >>
>> >>
>> >> ZQ Ye
>> >>
>> >> 2008/6/9 Hilmar Lapp <hlapp at gmx.net>:
>> >>>
>> >>> If this is the case with the latest version of BioPerl it
>> should be filed
>> >>> as
>> >>> a bug report for the embl parser. The ID ought to be reported in
>> >>> $seq->get_secondary_accessions() (which returns an
>> array). If it doesn't,
>> >>> it
>> >>> sounds like a bug to me.
>> >>>
>> >>>       -hilmar
>> >>>
>> >>> On Jun 9, 2008, at 4:47 AM, Marc Logghe wrote:
>> >>>>
>> >>>> Hi Wen,
>> >>>> A dump of that sequence object (Data::Dumper is your
>> friend !) reveals
>> >>>> that the PA EMBL field is not saved into the object.
>> However, you will
>> >>>> find the string 'AB000170.1' in the embedded CDS
>> feature, more precisely
>> >>>> the seqid of the location object. I don't know whether
>> that is always
>> >>>> the case, but it is in your particular example.
>> >>>> So, to get your hands on that value you have to do:
>> >>>>
>> >>>> my ($cds) = grep {$_->primary_tag eq 'CDS'}
>> $seq->get_SeqFeatures;
>> >>>> my $parent_id = $cds->location->seq_id;
>> >>>>
>> >>>> HTH,
>> >>>> Marc
>> >>>>
>> >>>> Marc Logghe
>> >>>> Senior Bioinformatician
>> >>>> Ablynx nv
>> >>>>>
>> >>>>> -----Original Message-----
>> >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> >>>>> bounces at lists.open-bio.org] On Behalf Of Wen Huang
>> >>>>> Sent: Monday, June 09, 2008 5:28 AM
>> >>>>> To: bioperl-l at lists.open-bio.org
>> >>>>> Subject: [Bioperl-l] EMBL format field
>> >>>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> I have a EMBL file that I want to extract one of the line
>> >>>>>
>> >>>>> ###file###
>> >>>>> ID   BAA19060; SV 1; linear; mRNA; STD; MAM; 2115 BP.
>> >>>>> XX
>> >>>>> PA   AB000170.1
>> >>>>> XX
>> >>>>> DE   Sus scrofa (pig) endopeptidase 24.16 type M1
>> >>>>> XX
>> >>>>> OS   Sus scrofa (pig)
>> >>>>> OC   Eukaryota; Metazoa; Chordata; Craniata;
>> Vertebrata; Euteleostomi;
>> >>>>> Mammalia;
>> >>>>> OC   Eutheria; Laurasiatheria; Cetartiodactyla; Suina;
>> Suidae; Sus.
>> >>>>> OX   NCBI_TaxID=9823;
>> >>>>> .........
>> >>>>>
>> >>>>> I want the accession number in the line that starts
>> with PA, AB000170
>> >>>>> in this example.
>> >>>>>
>> >>>>> Can anybody kindly help, tell me which module and
>> method I should use?
>> >>>>> I tried various things like $seq_obj -> primary_id, display_id,
>> >>>>> get_secondary_id, etc.. they did not work...
>> >>>>>
>> >>>>> Thanks a lot!
>> >>>>>
>> >>>>> Wen
>> >>>>> _______________________________________________
>> _______________________________________________
>> Bioperl-l mailing list


More information about the Bioperl-l mailing list