[Bioperl-l] Help Parsing FASTA Sequence File

Florent Angly florent.angly at gmail.com
Sat Dec 18 00:26:50 EST 2010


Hi,
You should probably start here:
http://www.bioperl.org/wiki/HOWTO:SeqIO
Florent

On 09/12/10 22:50, Fahmida wrote:
> Hi,
>
> I've several input 'score' files and their corresponding 'data' files like:
> score1.txt data1.txt
> score2.txt data2.txt
> ....
> ....
>
> score1.txt
>
> contig00002 length=671 numreads=17 1207 0.0
> contig00003 length=637 numreads=26 1205 0.0
> contig00052 length=535 numreads=10 607 e-176
> contig00072 length=472 numreads=46 571 e-165
> contig00019 length=667 numreads=5 474 e-136
>
> This file has several rows and five columns.column 1-3 are
> names/descriptions and column 4 (1207, 1205, etc) and column 5 (0.0,0.0,
> e-176, etc). contain the scores. I want to make a list of TOP 2 names based
> on column 4 score and whose column 5 score is not '0.0'. For example. for
> the above data the output list would be:
>
> contig00052 length=535 numreads=10
> contig00072 length=472 numreads=46
>
> Use the above list to extract data from the 'data1.txt':
>
> data1.txt
>
>> contig00001 length=567 numreads=35
> GGGCTGACGTGGCCGCTAATACGACTCACTATAGGGAGAGAAAaCCAAGGGAGAAaGAAa
> CTACACTACTAATGGAAAaGATCTACATGCTAGAAAAa
>> contig00002 length=671 numreads=17
> GGGgCTGACGTGgCcGCTAATACGACTCACTATAGGgAGAGTTACTGTGGAGGGAGAGGC
> TTGCTCAAaTCCGCGTTCAAGGATTTCCAGATTGGTAAGAACTTCAGATT
>> contig00052 length=535 numreads=10
> GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA
> CCCAGGTGCCGTTAGCCA
>> contig00003 length=637 numreads=26
> GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA
> CCCAGGTGCCGTTAGCCAGAGCTG
>> contig00072 length=472 numreads=46
> GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA
> GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT
> CTCAAGcACTAGGATC
>> contig00019 length=504 numreads=5
> GGGCTGACGTGGCCGCTAATACGACTCACTATAGGgAGAGATCTCACTAAAAAACTGGGG
> ATAACGCCT
>
>
> Example Output file:
>
>> contig00052 length=535 numreads=10
> GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGATCGTGGCGATCGCCAATCA
> CCCAGGTGCCGTTAGCCA
>> contig00072 length=472 numreads=46
> GGGCTGACGTGgCCGCTAATACGACTCACTATAGGGAGAGTTTtCCCCAGGACCCTGGGA
> GGACCATGCCGTATGGGTGTCTAGTAAGTACAAaGCCATAATTCACATAAGTGAAATATT
> CTCAAGcACTAGGATC
>
> Any reply would be greatly appreciated.
>



More information about the Bioperl-l mailing list