[Bioperl-l] GenBank entries creation dates

Chris Fields cjfields at uiuc.edu
Mon Apr 7 13:48:45 EDT 2008


Note in the example I gave that, during the revision history, the  
DBSOURCE changed at the point of the creation date (the original nuc.  
record was a M. tuberculosis contig sequence, which later changed to  
an updated full M. tuberculosis genome record at the time of the  
'create date').

Couldn't find anything specific in the GenBank docs on this, but it  
appears (at least for a protein record) the creation date reflects the  
date in which the sequence was either originally deposited or  
originally derived from the nucleotide source record present in the  
record.  In other words, it may not reflect the original date of  
deposition (which could have come from a different record, as in this  
case).

chris

On Apr 7, 2008, at 11:24 AM, Miguel Pignatelli wrote:

>
> I've noticed that the ASN.1 version of those records has a "creation- 
> date" tag.
> But this is somehow strange, because the creation date obtained by  
> you and that obtained via ASN.1 format is 2003/11/21, but if you  
> look at the revision history of the record:
>
> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=CAB02640
>
> reports a creation date of "Oct 19 1996 12:28 AM"
>
> I don't know how to get this, because the EMBL version of this gene:
>
> http://www.ebi.ac.uk/cgi-bin/dbfetch?db=emblcds&id=CAB02640&style=raw
>
> doesn't has DT fields at all.
>
> M;
>
>
> Chris Fields wrote:
>> Strangely enough, if you use NCBI's esummary you can get both  
>> dates.  Via Bio::DB::EUtilities in bioperl-live, if you dump out  
>> DocSum data (using a debugging method I added in a while back):
>> ---------------------------------------
>> use Bio::DB::EUtilities;
>> # for multiple IDs use an array ref; also only use GI's (not  
>> accessions)
>> my $factory = Bio::DB::EUtilities->new(
>>                        -eutil => 'esummary',
>>                        -db => 'protein',
>>                        -id => 1621261);
>> $factory->print_DocSums;
>> ---------------------------------------
>> One gets the following tag/value pairs:
>> UID: 1621261
>> Caption             :CAB02640
>> Title               :PROBABLE PYRIMIDINE OPERON REGULATORY PROTEIN  
>> PYRR [Mycobacterium tuberculosis
>>             H37Rv]
>> Extra               :gi|1621261|emb|CAB02640.1|[1621261]
>> Gi                  :1621261
>> CreateDate          :2003/11/21
>> UpdateDate          :2006/11/14
>> Flags               :
>> TaxId               :83332
>> Length              :193
>> Status              :live
>> ReplacedBy          :
>> Comment             :
>> I'll add in a method to grab the data element by tag (in this case,  
>> grab the creation date by asking for the 'CreateDate' key).  Might  
>> come in handy for scripts.
>> chris
>> On Apr 7, 2008, at 7:48 AM, Heikki Lehvaslaiho wrote:
>>> Miguel,
>>>
>>> You probably know this but:
>>>
>>> - Your entry example below is a GenPept entry, not a GenBank entry
>>> - The NCBI sequence format "genbank" has only the last modified  
>>> date.
>>>  I do not know about other formats (ASN.1, ...)
>>> - NCBI Entrez is a great tool but it obscures the source database.
>>> - If you really are working on real GenBank entries, you can use  
>>> the accession
>>> number to see find corresponding EMBL (and Swiss-Prot) flat file  
>>> formats that
>>> have both creation and last modified dates.
>>>
>>> Post to the list if you have trouble getting the dates from EMBL/ 
>>> Swiss-Prot
>>> formats using bioperl.
>>>
>>> Yours,
>>>
>>>    -Heikki
>>>
>>> On Monday 07 April 2008 12:12:58 Miguel Pignatelli wrote:
>>>> Hi all,
>>>>
>>>> Is there any way to obtain the date of creation of individual  
>>>> GenBank
>>>> entries? I don't mean the "last revision" date that can be found  
>>>> in the
>>>> first line of a GenBank file.
>>>>
>>>> I can access this creation date by looking at the "revision  
>>>> history" of
>>>> any GenBank entry (for example, see
>>>> http://www.ncbi.nlm.nih.gov/entrez/sutils/girevhist.cgi?val=74311105) 
>>>> ,
>>>> but I need a systematic (and local=fast) way to access this  
>>>> information.
>>>>
>>>> Any help would be very appreciated,
>>>> Thank you very much in advance,
>>>>
>>>> M;
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>> -- 
>>> ______ _/      _/ 
>>> _____________________________________________________
>>>     _/      _/
>>>    _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>>>   _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>>>  _/  _/  _/  SANBI, South African National Bioinformatics Institute
>>> _/  _/  _/  University of Western Cape, South Africa
>>>    _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
>>> ___ _/_/_/_/_/ 
>>> ________________________________________________________
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





More information about the Bioperl-l mailing list