[Bioperl-l] Output a subset of FASTA data from a single large file

Sean Davis sdavis2 at mail.nih.gov
Fri Jun 9 14:29:53 EDT 2006




On 6/9/06 1:59 PM, "Chris Fields" <cjfields at uiuc.edu> wrote:

> No; I saw the same thing here.  It's not FASTA in the traditional sense:
> 
> http://www.bioperl.org/wiki/FASTA_sequence_format
> 
> though he did get it to build a database successfully.  Well, 'success' in
> the sense that no errors were thrown.  I've learned the absence of error
> messages does not necessarily mean that everything went as planned; it
> depends on how much error handling has been added to the module by the
> submitting author.
> 
> It's possible that the second annotation line was ignored completely.  I
> suppose it's also possible that two sequences are entered into the database,
> an empty sequence for the first '>' line and the full sequence for the
> second.  It's all dependent on how the parser handles this.

I think that Senthil was pointing out that even though >Antisense looks to
be on its own line, it isn't, but is simply a continutation of the FASTA
header.  Judging from the context, that is the only interpretation that
makes sense.  

Sean

>> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631;
>> |> >Antisense;
>> |> TGGCTCCTGCTGAGGTCCCCTTTCC
>> |
>> |Unfortunately that's not Fasta format (which only has a single header
>> |line starting with a '>'.  I'd imagine that most programs which deal
>> |with fasta which read that entry would see it as two sequences, the
>> |first of which is empty.
>> |
>> 
>> [snipped]
>> 
>> hi,
>> 
>> I think the file is in fasta format and probably you might have seen it
>> differently because of your mail transport agent.



More information about the Bioperl-l mailing list