[Bioperl-l] phd format parsing produces infinite memory usage

Jorge.DUARTE at biogemma.com Jorge.DUARTE at biogemma.com
Wed Jul 23 05:08:44 EDT 2008


Hello,

i've been trying to use Bio::SeqIO to parse a phd-like format file.

The script doesn't produce any error (nor output), but the memory usage 
keeps increasing until it reaches its limit (well i stoped the process at 
16Gb of memory)

Is the problem known ? Or is my file format wrong ?

If i use only the first sequence from my file (bellow), the script works 
fine, maybe there is something wrong in the middle of the file... how can 
i print debugging info ?

Thanks for any help

jorge

example data :

BEGIN_SEQUENCE FAINC1H01EI6V4.9-204

BEGIN_COMMENT

CHROMAT_FILE: sff:FAINC1H01.sff:FAINC1H01EI6V4
ABI_THUMBPRINT: none
PHRED_VERSION: not called by phred
CALL_METHOD: 454
QUALITY_LEVELS: 99
TIME: Thu Jul 27 12:33:48 2000
TRACE_ARRAY_MIN_INDEX: 0
TRACE_ARRAY_MAX_INDEX: 4628
CHEM: unknown
DYE: unknown

END_COMMENT

BEGIN_DNA
a 40 718
c 40 737
g 40 756
c 40 775
g 40 794
g 40 813
g 40 832
g 40 851
a 40 870
a 40 889
g 40 908
t 39 927
c 37 946
t 37 965
g 37 984
a 37 1003
a 37 1022
g 37 1041
a 38 1060
a 38 1079
a 38 1098
c 37 1117
a 37 1136
a 37 1155
t 37 1174
c 37 1193
a 37 1212
a 37 1231
c 37 1250
t 37 1269
a 37 1288
t 38 1307
t 38 1326
t 38 1345
a 37 1364
c 37 1383
c 37 1402
t 37 1421
a 37 1440
t 37 1459
g 37 1478
c 37 1497
a 37 1516
t 37 1535
a 37 1554
c 37 1573
t 37 1592
t 37 1611
c 37 1630
a 37 1649
g 37 1668
a 37 1687
t 37 1706
g 37 1725
c 37 1744
t 37 1763
t 37 1782
a 37 1801
c 37 1820
a 37 1839
a 37 1858
g 37 1877
a 37 1896
g 37 1915
a 37 1934
g 37 1953
a 37 1972
g 37 1991
g 37 2010
t 37 2029
c 37 2048
a 37 2067
c 37 2086
a 37 2105
c 37 2124
t 37 2143
g 37 2162
c 37 2181
c 37 2200
a 37 2219
c 37 2238
t 37 2257
t 37 2276
g 37 2295
a 37 2314
g 37 2333
c 37 2352
t 37 2371
g 37 2390
c 37 2409
a 37 2428
g 37 2447
c 37 2466
t 37 2485
a 37 2504
g 37 2523
c 37 2542
c 37 2561
t 37 2580
t 37 2599
g 37 2618
c 37 2637
a 37 2656
a 37 2675
t 37 2694
t 37 2713
g 37 2732
g 37 2751
a 37 2770
a 37 2789
c 37 2808
c 37 2827
c 37 2846
t 37 2865
g 37 2884
a 37 2903
a 37 2922
g 37 2941
g 37 2960
g 37 2979
t 37 2998
g 37 3017
a 37 3036
a 37 3055
c 37 3074
a 37 3093
t 37 3112
g 37 3131
a 37 3150
a 37 3169
a 37 3188
c 37 3207
a 37 3226
t 37 3245
a 37 3264
g 38 3283
t 34 3302
c 34 3321
c 34 3340
a 24 3359
a 24 3378
a 24 3397
t 34 3416
c 34 3435
c 34 3454
a 34 3473
g 38 3492
c 38 3511
t 37 3530
t 37 3549
c 37 3568
t 38 3587
g 38 3606
c 38 3625
a 38 3644
a 38 3663
a 38 3682
g 37 3701
a 37 3720
a 37 3739
c 37 3758
g 37 3777
c 37 3796
t 37 3815
a 37 3834
c 37 3853
t 37 3872
g 37 3891
a 37 3910
t 37 3929
g 37 3948
a 34 3967
a 34 3986
g 33 4005
g 33 4024
c 38 4043
t 38 4062
c 26 4081
c 26 4100
c 21 4119
c 21 4138
a 34 4157
c 34 4176
c 23 4195
g 31 4214
t 33 4233
g 36 4252
c 36 4271
t 36 4290
c 36 4309
t 36 4328
c 36 4347
c 36 4366
g 36 4385
c 36 4404
a 36 4423
t 33 4442
a 33 4461
t 33 4480
t 33 4499
c 26 4518
c 26 4537
c 26 4556
a 26 4575
END_DNA

END_SEQUENCE

--- 
Jorge Duarte
Bioinformatics Research Engineer
BIOGEMMA - Upstream Genomics Group
Z.I. Du Brézet
8, Rue des Frères Lumière
63028 CLERMONT FERRAND Cedex 2
FRANCE
Tel : +33 (0)4 73 39 60 73
Fax : +33 (0)4 73 39 60 71
E-mail : jorge.duarte at biogemma.com

*****************************************************************
       Pour toute demande de support merci d'inclure
BIOGEMMA_BioInfo_Service ou bioinfo at biogemma.com
         dans les destinataires lors du premier contact
*****************************************************************
BIOGEMMA S.A.S. au capital social de 48.335.652,00 ?. 1, Rue Edouard 
Colonne - 75001 PARIS. RCS PARIS 412 514 366
This message and any attachments are confidential and intended solely for 
the use of the addressee(s) named above. The information contained in this 
email may also be legally privileged. If you have received this email in 
error, please notify us immediately by reply email or by fax and then 
delete it. Any use, distribution or reproduction of this message is 
strictly prohibited. The integrity or authenticity of this message cannot 
be guaranteed. We therefore shall not be liable for the message if 
altered, changed or falsified. Thank you.

Cet email et ses pièces jointes sont strictement confidentiels et destinés 
uniquement à l'usage du (des) destinataire(s) sus-indiqué(s). Les 
informations contenues dans cet email sont légalement protégées. Si vous 
avez reçu cet email par erreur, merci de nous le retourner immédiatement 
par courrier électronique ou télécopie avant de le supprimer. Toute 
utilisation ou reproduction de cet email est strictement interdite. La 
véracité et l'authenticité de cet email et de son contenu ne peuvent être 
garanties et nous ne pouvons être tenus responsables de leur altération, 
modification ou falsification. Merci.


More information about the Bioperl-l mailing list