[Bioperl-l] Genbank2gff3 script update
gilbertd at cricket.bio.indiana.edu
Tue Mar 27 19:42:30 EDT 2007
Dear Bioperl developers,
Here is an improved bp_Genbank2gff3.pl script, with bug fixes
and enhancements. The non-transparent changes in behavior are
made via non-default command flags. I've updated these against current
Bioperl CVS. Would one of you care to add this to your CVS repository?
THanks, Don Gilbert
Find at http://eugenes.org/gmod/genbank2chado/
=item Bioperl bp_genbank2gff3.pl
bin/genbank2gff3.PLS (Bioperl CVS scripts/Bio-GFF-DB/genbank2gff3.PLS)
lib/Bio-new/SeqFeature/Tools/TypeMapper.pm (required for genbank2gff3 update)
lib/Bio-new/SeqFeature/Tools/Unflattener.pm (minor change suggested for genbank2gff3)
(put into your Bioperl lib/Bio/... directories)
There are also this unrelated patch
-- new flag to ignore excess subfeatures from Chado's gene-mrna-polypeptide-exon model.
=item Genbank2gff3 changes
* Polypeptide alternate gene model added (--noCDS option)
Standard gene model: gene > mRNA > (UTR,CDS,exon)
G-R-P-E alternate model: gene > mRNA > polypeptide > exon
Polypeptide contains all the important protein info (IDs, translation, GO terms)
* IO pipes: curl ftp://ncbigenomes/... | genbank2gff3 --in stdin --out stdout | gff2chado ...
* GenBank main record fields are added to source feature
and the sourcetype, commonly chromosome for genomes, is used.
* Gene Model handling for ncRNA, pseudogenes are added.
* GFF header is cleaner, more informative, and GFF_VERSION option
* GFF ##FASTA inclusion is improved, and translation sequence stored there.
* FT -> GFF attribute mapping is improved.
* --format choice of SeqIO input formats (GenBank default).
Uniprot/Swissprot and EMBL produce useful GFF.
* SeqFeature::Tools::TypeMapper has a few FT -> SOFA additions, more flexible usage.
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
More information about the Bioperl-l