[Bioperl-l] Windows bug in Bio::DB::Fasta?
cjfields at uiuc.edu
Tue Aug 23 16:03:56 EDT 2005
That did the trick! Everything looks fine now. Thanks Lincoln!
At 05:18 PM 8/22/2005, Lincoln Stein wrote:
>I've just looked into this. The bug occurs when Windows opens the FASTA file
>in text mode rather than binary mode; when in text mode the "\r\n" sequence
>is invisibly mapped to "\n" during readline operations, so Bio::DB::Fasta
>thinks that it is dealing with a Unix-format file; then when the module tries
>to seek() to the proper line number, Windows doesn't do the line end mapping,
>so it seeks to the wrong offset. (sound of hairs being pulled)
>I've fixed the problem by explicitly calling binmode() on all filehandles
>Bio::DB::Fasta calls. The new version of Fasta.pm is in both bioperl CVS and
>the gbrowse 1.63 CVS version. It ought to fix Chris' GC content weirdness.
>On Monday 15 August 2005 01:22 pm, Scott Cain wrote:
> > Just to follow up on my own email with a little more information: in
> > Fasta.pm, line 697:
> > $termination_length ||= /\r\n$/ ? 2 : 1; # account for crlf-terminated
> > Windows files
> > The pattern match is failing on DOS formatted files; I don't know why.
> > Does anyone else?
> > On Mon, 2005-08-15 at 10:35 -0400, Scott Cain wrote:
> > > Hello all,
> > >
> > > I am investigating a bug in GBrowse that seems to only surface when
> > > people are using the memory (ie, file) adaptor on Windows systems.
> > > Here's the bug report:
> > >
> > > https://sourceforge.net/tracker/?func=detail&atid=391291&aid=1256169&grou
> > >p_id=27707
> > >
> > > I've tracked the problem down to Bio::DB::Fasta when the file is dos
> > > formatted (that is, it has both line feeds and carriage returns), BDF
> > > returns the wrong string when a subsequence is requested, but when the
> > > file is unix formatted (ie only CR (or is it only LF?)), it returns the
> > > right string. I wrote the very simple test script below and stepped it
> > > through the perl debugger. It looks like the bug is in the caloffset
> > > method, as it returns the same offsets regardless of the file type,
> > > which then makes the subsequent seek into the file go to the wrong
> > > coordinates of dos formatted files.
> > >
> > > Unfortunately, I don't really know what is going on caloffset, so I
> > > don't know how to fix it, but it presumably has to check the format of
> > > the file somewhere and take that into account.
> > >
> > > Thanks,
> > > Scott
>Lincoln D. Stein
>Cold Spring Harbor Laboratory
>1 Bungtown Road
>Cold Spring Harbor, NY 11724
>FOR URGENT MESSAGES & SCHEDULING,
>PLEASE CONTACT MY ASSISTANT,
>SANDRA MICHELSEN, AT michelse at cshl.edu
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l