[Bioperl-l] GenBank accession bug?

Chris Fields cjfields at uiuc.edu
Thu Feb 22 16:01:03 EST 2007

On Feb 22, 2007, at 2:31 PM, dmessina at watson.wustl.edu wrote:

>> The issue at hand is whether we can support GenBank accessions/
>> display_id/version with your naming scheme.
> Chris, I'm a little unsure of what you're saying here (which might  
> mean
> that you're already saying what I'm about to...say). Do you mean it  
> might
> be tricky to support both the Genbank standard and Dmitry's
> simultaneously?
> I would argue any arbitrary ID should be supported as long as that  
> ID is a
> contiguous non-space word (\S+).
> Actually the existing accession regex looks like it already  
> supports IDs
> with '-':
> /^ACCESSION\s+(\S.*\S)/
> It's only the version regex which doesn't (\w doesn't include '-'):
> /^\w+\.(\d+)/
> Anyone else have thoughts or comments on this? Off the top of my  
> head, I
> can't think of any issues that might arise from doing so (apart from
> having to modify all of the SeqIO modules to support it).
> Dave

You're right; the argument comes down simply to whether we would  
support \S+ or just \w+.  I'm neutral on this myself, but I wonder  
how allowing \S+ would affect other modules (for instance, indexing  
for a flat db), where one might just use \w+ for accessions,  
expecting them to be GenBank- or EMBL-like alphanumerics.  The fact  
that \S+ was supported in the past (as indicated in the bug report)  
and then wasn't post 1.2 makes me think there was a reason for  
someone going in and modifying it, but that was before my time on the  

I'll have a look at the CVS history when I have time to see what I  
can dig up.


More information about the Bioperl-l mailing list