[Bioperl-l] a problem when using the Bio::DB::Fasta
cjfields at illinois.edu
Tue Aug 24 11:54:20 EDT 2010
Please keep all responses on-list.
Judging by the stack traces below, you are also running off a UNIX-like system. To concatenate files, use 'cat'. So, for all files ending with .fa:
cat *.fa >> all.fa
On Aug 24, 2010, at 8:54 AM, Guifeng Wei wrote:
> Hello Fields,
> i have checked the fasta files. i suddenly find that the last line is blank line, and the last second is less than common.
> i am not able to run the command line as Jason's advice because i have no knowledge about "sreformat".
> i also want to ask a more question. i want megre the several single chromosome sequence file into one, OK?
> thank you very much.
> Wei Guifeng
> 2010/8/24 Chris Fields <cjfields at illinois.edu>
> Did you follow Jason's advice yesterday about converting the FASTA over to a more consistent length? Or checking the database itself? These are both things reiterated by Florent and Peter.
> From Jason's last response:
> Wei -
> Please ask your questions on the bioperl mailing list, I cannot answer questions directly for all requests.
> Your problem has been answered by me on the list before so I urge you to use the list archives as a starting point.
> The line lengths of the fasta file sequence aren't the same length.
> you need to run this
> bp_sreformat -if fasta -of fasta -i ORIGINAL -o NEW
> mv NEW ORIGINAL
> or with sreformat
> sreformat fasta ORIGINAL > NEW
> mv NEW ORIGINAL
> On Aug 24, 2010, at 6:28 AM, Guifeng Wei wrote:
> > Hi,
> > i have revised my scripts according to the previous email from Florent.
> > However, there were still some errors which frustrated me so much.
> > The errors are as follows:
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Each line of the fasta entry must be the same length except the last.
> > Line above #301451 '
> > ..' is 22 != 51 chars.
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:368
> > STACK: Bio::DB::Fasta::calculate_offsets
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:770
> > STACK: Bio::DB::Fasta::index_dir
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:593
> > STACK: Bio::DB::Fasta::new
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm:488
> > STACK: bed2fasta.pl:13
> > -----------------------------------------------------------
> > indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory.index at
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Fasta.pm line 1053
> > But in the directory /home/wgf/elegans190.dna/ , it concludes 6 files,
> > each contains the complete sequences from one single chromosome, the format
> > is fasta. The extension of the FASTA files is .fa. Every single file is
> > started as ">chromosoemeXXX" followed by the thousands of sequences.
> > and therefore, it warn me that "Each line of the fasta entry must be the
> > same length except the last". and "indexing was interrupted, so unlinking
> > /home/wgf/elegans190.dna//directory".
> > i was much confused about this. so for help.
> > Wei Guifeng
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> Î£¹ó·å Wei Guifeng
More information about the Bioperl-l