[Bioperl-l] SeqIO alters Genbank files

Chris Fields cjfields at illinois.edu
Thu Aug 25 13:16:03 EDT 2011


Brian,

Yes, that's correct (comment out or remove the other stuff).  Not sure what difference it will make, I'm interested to see if anything fundamental expects this behavior and breaks with tests.  Using 'git blame', it appears Allen Day added this in relation to Feature-Annotation code we actually reverted a few years ago, so this should be removed anyway.

I still think we should work around FTHelper altogether.  Reading the code, it seems like a ton of wasted instances being generated for no apparent reason.  Now going back to our bioperl archives to see if there is any need for it...

chris

On Aug 25, 2011, at 11:53 AM, Brian Osborne wrote:

> Chris,
> 
> OK, will do. I should add that an early version of FTHelper was doing this same edit with the "strand", "source_tag", and "frame" tags but someone has commented out the "source_tag" and "strand" lines.
> 
> Should I comment out both "score" and "frame" code?
> 
> BIO
> 
> On Aug 25, 2011, at 12:42 PM, Chris Fields wrote:
> 
>> Brian,
>> 
>> I think comment out the code; our baked-in validation is only half-correct anyway, and I think it's probably a good idea to veer towards separation of format validation and parsing (they're two related but different concerns).
>> 
>> To tell the truth, I think we should eschew using FTHelper altogether and just use a Bio::SeqFeatureI-based class directly.  I haven't quite grasped the reasoning behind FTHelper.pm, and I would bet removing it as a middleman across the board would help parsing speed.  Anyone have an objection to that, or at least an explanation for generation of tons of FTHelper instances that couldn't be handled by a Factory?
>> 
>> chris
>> 
>> On Aug 25, 2011, at 9:35 AM, Brian Osborne wrote:
>> 
>>> bioperl-l,
>>> 
>>> I need to run something by you before I commit code and tests. I have code that takes a Genbank file as input and creates another Genbank file as output. I noticed that SeqIO - specifically FTHelper.pm - was taking a tag like this in the input file:
>>> 
>>> /score=100.1
>>> 
>>> And adding a "note" tag, so the output file contains this:
>>> 
>>> /score=100.1
>>> /note="score=100.1"
>>> 
>>> I'm assuming that the code does this because NCBI will not accept score tags and values even though Bioperl, generally speaking, does not say that NCBI defines the fine details of Genbank format. 
>>> 
>>> On the other hand I don't like the idea that SeqIO is altering the content. It also turns out that if you have code that does multiple round-trips you end up with text like this:
>>> 
>>> /score=100.1
>>> /note="score=100.1"
>>> /note="score=100.1"
>>> /note="score=100.1"
>>> /note="score=100.1"
>>> 
>>> Should I comment out the code that's doing these edits or not?
>>> 
>>> Thanks again,
>>> 
>>> Brian O.
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




More information about the Bioperl-l mailing list