The EMBL flat format is a rich format for storing sequences and their associated meta-information, feature coordinates, and annotations. It shares details with the GenBank sequence format.
ID SC10H5 standard; DNA; PRO; 4870 BP.
XX
AC AL031232;
XX
DE Streptomyces coelicolor cosmid 10H5.
XX
KW integral membrane protein.
XX
OS Streptomyces coelicolor
OC Eubacteria; Firmicutes; Actinomycetes; Streptomycetes;
OC Streptomycetaceae; Streptomyces.
XX
RN [1]
RP 1-4870
RA Oliver K., Harris D.;
RT ;
RL Unpublished.
XX
RN [2]
RP 1-4870
RA Parkhill J., Barrell B.G., Rajandream M.A.;
RT ;
RL Submitted (10-AUG-1998) to the EMBL/GenBank/DDBJ databases.
RL Streptomyces coelicolor sequencing project,
RL Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA
RL E-mail: barrell@sanger.ac.uk
RL Cosmids supplied by Prof. David A. Hopwood, [3]
RL John Innes Centre, Norwich Research Park, Colney,
RL Norwich, Norfolk NR4 7UH, UK.
XX
RN [3]
RP 1-4870
RA Redenbach M., Kieser H.M., Denapaite D., Eichner A.,
RA Cullum J., Kinashi H., Hopwood D.A.;
RT "A set of ordered cosmids and a detailed genetic and physical
RT map for the 8 Mb Streptomyces coelicolor A3(2) chromosome.";
RL Mol. Microbiol. 21(1):77-96(1996).
XX
CC Notes:
CC
CC Streptomyces coelicolor sequencing at The Sanger Centre is funded
CC by the BBSRC.
CC
CC Details of S. coelicolor sequencing at the Sanger Centre
CC are available on the World Wide Web.
CC (URL; [`http://www.sanger.ac.uk/Projects/S_coelicolor/`](http://www.sanger.ac.uk/Projects/S_coelicolor/)`)`
CC
CC CDS are numbered using the following system eg SC7B7.01c.
CC SC (S. coelicolor), 7B7 (cosmid name), .01 (first CDS),
CC c (complementary strand).
CC
CC The more significant matches with motifs in the PROSITE
CC database are also included but some of these may be fortuitous.
CC
CC The length in codons is given for each CDS.
CC
CC Usually the highest scoring match found by fasta -o is given for
CC CDS which show significant similarity to other CDS in the database.
CC The position of possible ribosome binding site sequences are
CC given where these have been used to deduce the initiation codon.
CC
CC Gene prediction is based on positional base preference in codons
CC using a specially developed Hidden Markov Model (Krogh et al.,
CC Nucleic Acids Research, 22(22):4768-4778(1994)) and the FramePlot
CC program of Bibb et al., Gene 30:157-66(1984) as implemented at
CC [`http://www.nih.go.jp/~jun/cgi-bin/frameplot.pl`](http://www.nih.go.jp/~jun/cgi-bin/frameplot.pl)`. CAUTION: We may `
CC not have predicted the correct initiation codon. Where possible
CC we choose an initiation codon (atg, gtg, ttg or (att)) which is
CC preceded by an upstream ribosome binding site sequence (optimally
CC 5-13bp before the initiation codon). If this cannot be identified
CC we choose the most upstream initiation codon.
CC
CC IMPORTANT: This sequence MAY NOT be the entire insert of
CC the sequenced clone. It may be shorter because we only
CC sequence overlapping sections once, or longer, because we
CC arrange for a small overlap between neighbouring submissions.
CC
CC Cosmid 10H5 lies to the right of 3A7 on the AseI-B genomic restriction
CC fragment.
XX
FH Key Location/Qualifiers
FH
FT source 1..4870
FT /organism="Streptomyces coelicolor"
FT /strain="A3(2)"
FT /clone="cosmid 10H5"
FT CDS complement(<1..327)
FT /note="SC10H5.01c, unknown, partial CDS, len >109 aa;
FT possible integral membrane protein"
FT /gene="SC10H5.01c"
FT /product="hypothetical protein SC10H5.01c"
FT CDS complement(350..805)
FT /note="SC10H5.02c, probable integral membrane protein, len:
FT 151 aa; similar to S. coelicolor hypothetical protein
FT TR:O54194 (EMBL:AL021411) SC7H1.35 (155 aa), fasta scores;
FT opt: 431 z-score: 749.8 E(): 0, 53.5% identity in 114 aa
FT overlap."
FT /product="putative integral membrane protein"
FT /gene="SC10H5.02c"
FT RBS complement(812..815)
FT /note="possible RBS upstream of SC10H5.02c"
FT CDS complement(837..1301)
FT /note="SC10H5.03c, probable integral membrane protein, len:
FT 154 aa"
FT /product="putative integral membrane protein"
FT /gene="SC10H5.03c"
FT RBS complement(1308..1312)
FT /note="possible RBS upstream of SC10H5.03c"
FT CDS complement(1427..1735)
FT /note="SC10H5.04c, unknown, len: 103 aa; possible membrane"
FT /gene="SC10H5.04c"
FT /product="hypothetical protein SC10H5.04c"
FT RBS complement(1738..1741)
FT /note="possible RBS upstream of SC10H5.05c"
FT misc_feature 1800^1801
FT /note="Zero-length feature added to test Bioperl parsing"
FT CDS 1933..2022
FT /note="SC10H5.05, questionable ORF, len: 29 aa"
FT /gene="SC10H5.05"
FT /product="hypothetical protein SC10H5.05"
FT CDS 2019..2642
FT /note="SC10H5.06, probable membrane protein, len: 207 aa;
FT similar to S. coelicolor TR:O54192 SC7H1.33c (191 aa),
FT fasta scores; opt: 312 z-score: 355.2 E(): 1.6e-12, 36.8%
FT identity in 182 aa overlap"
FT /product="putative membrane protein"
FT /gene="SC10H5.06"
FT RBS 2627..2631
FT /note="possible RBS upstream of SC10H5.07"
FT CDS 2639..4048
FT /note="SC10H5.07, unknown, len: 469 aa"
FT /gene="SC10H5.07"
FT /product="hypothetical protein SC10H5.07"
FT CDS complement(4100..4297)
FT /note="SC10H5.08c, unknown, len: 65 aa"
FT /gene="SC10H5.08c"
FT /product="hypothetical protein SC10H5.08c"
FT RBS complement(4314..4319)
FT /note="possible RBS upstream of SC10H5.08c"
FT CDS complement(4439..>4870)
FT /note="SC10H5.09c, probable integral membrane protein,
FT partial CDS len: >143 aa; some similarity in C-terminus to
FT S. coelicolor hypothetical protein TR:O54106
FT (EMBL:AL021529) SC10A5.15 (114 aa), fasta scores; opt: 145
FT z-score: 233.8 E(): 9.2e-06, 33.3% identity in 81 aa
FT overlap. Overlaps and extends SC3A7.01c"
FT /product="putative integral membrane protein"
FT /gene="SC10H5.09c"
FT misc_feature 4769..4870
FT /note="overlap with cosmid 3A7 from 1 to 102"
XX
SQ Sequence 4870 BP; 769 A; 1717 C; 1693 G; 691 T; 0 other;
gatcagtaga cccagcgaca gcagggcggg gcccagcagg ccggccgtgg cgtagagcgc 60
gaggacggcg accggcgtgg ccaccgacag gatggctgcg gcgacgcgga cgacaccgga 120
gtgtgccagg gcccaccaca cgccgatggc cgcgagcgcg agtcccgcgc tgccgaacag 180
ggcccacagc acactgcgca gaccggcggc cacgagtggc gccaggacgg tgcccagcag 240
gagcagcagg gtgacgtggg cgcgcgctgc actgtggccg ccccgtccgc ccgacgcgcg 300
cggctcgtca tctcgcggtc ccaccaccgg tcggccccat tactcgtcct caaccctgtg 360
gcgactgacg ttccccggac aggtcgtacc gattgccgcc acgccccacc acgcacaggg 420
cccagacgac gaagcctgac atggtgatca tgacgacgga ccacaccggg tagtacggca 480
gcgagaggaa gttggcgatg atcaccagcc cggcgatggc gaccccggtg acacgtgccc 540
acatcgccgt tttgagcagc ccggcgctga cgaccatggc gagcgcgccg agcgcgagat 600
ggatccaccc ccacccggtg agatcgaact ggaaaacgta gttgggcgtg gtgacgaaga 660
cgtcgtcctc ggcgatggcc atgatgcccc ggaagaggct gagcagcccg gcgaggaaga 720
gcatcaccgc cgcgaaggcg gtaaggcccg tcgcccattc ctgcctcgcg gtgtgtgccg 780
ggtggtgggt atgtgacgtg gtcatctcgg acctcgtttc gtggaatgcg gatgcttcag 840
cgagcggagg cgccggtgcc cgccgcgccc gtgtgccctg ccgggccgtg accggacagg 900
accaattcct tcgccttgcg gaactcctcg tccgtgatgg caccccggtc tcggatctcg 960
gagagccggg ccagctcgtc gacgctgctg gacccgccgc ccacggtctt cctgatgtag 1020
gcgtcgaact cctcctgctg agcccgtgcc cgcgttgtct cccggctgcc catgttcttg 1080
ccgcgagcga tcacgtagac gaaaacgccc aggaagggca ggaggatgca gaacaccaac 1140
cagccggcct tcgcccagcc actcagtccg tcgtcccgga agatgtcggt gacgacgcgg 1200
aagagcagga cgaaccacat gatccacagg aagatcatca gcatcgtcca gaaggcaccc 1260
agcagtgggt agtcgtacgc caggtaggtc tgtgcactca tgtccgtcct ccgtcctccg 1320
gggcgcggcc cggcggccct cgttccgtac tgacatcagg gtggtcacgg gtcccaccgg 1380
tcggcatcac ccggcacggg tgagtggggc gccgaggccg tcgtggtcag gcccgggaca 1440
ccggtgtgac cctggtggaa ggacgcgtcc cgtggggcac gcaccgccgg ccgagggcga 1500
ccaccgcctc ggtcagtccg agcaggccca gccacaggcc gagaagtcgg gtcagggcac 1560
gggccgactc ggcgggcagc gcgaggacga cgattccggc gacgtcgacg gccagcgggt 1620
tgcgcaggcc cagcactccg gccggggcgc ccggcaccag cgtggcgagg gccgatgcca 1680
tgagccaggt ccaggaaccc ccaagcctgg cgaggacgtg cgccggatcg ctcaatgctc 1740
cggtgaccgc cccgcccgac ccgtctccct tgtcggcagg ttccgccgca tcacgcggaa 1800
cggagatggc tcccctgtgg atcgggcggc cgctgcgggg ccgcccggtt ggtcggtcgg 1860
tgagcgccgg actccccctt cagctcttcc agggtcgggg tcgacaccga ggtcctggat 1920
cacccgtcag gggtgatccg ggcatgccgt cgtggcggtg aggtgggata cgggaacgat 1980
cggcccacgg gggaccggac gagacgaaga gacgtgagat gagcgatacg aactcgggcg 2040
gcgggcgcca ggccgcttcc ggaccggccc cacgtggccg actccctttc cgccggcgcg 2100
tggccctggt cgctgtcgca cgtcccctga tcgtcacggt cggtctcgtc accgcctact 2160
acctgcttcc cctggacgag agactcagcg ccggcaccct ggtgtcgctg gtgtgcggac 2220
tgctcgcagt ccttctggtg ttctgctggg aggtgcgggc catcacgcgc tccccgcatc 2280
cgcgtctgag agcgatcgag ggcctggccg ccacgctggt gctgttcctg gtcctcttcg 2340
ccggctccta ctacctgctg ggtcgctccg cgcccggctc cttcagcgag ccgctgaaca 2400
ggacggacgc gctgtacttc actctgacca cgttcgccac cgtcggcttc ggggacatca 2460
ccgcacgctc cgagaccggg cggatcctca cgatggcgca gatgacggga gggctactgc 2520
tcgtcggagt cgccgcccgg gtgctggcga gcgcagtgca ggcggggctg caccgacagg 2580
gccggggacc ggcggcatcg ccacgctccg gtgctgcgga ggagccggag gccggaccat 2640
gaccgtaccc ggtggcttca ccgcctccct gccgccggcc gagcgagccg cgtacggcag 2700
gaaggcccgt aaaagggcct cacgttcgtg ccacggctgg tacgagccgg ggcagcggcg 2760
gcctgacccc gtcgacctgc tggagcgcca gtccggcgag cgtgtcccgg cactcgtgcc 2820
catccgctac ggtcgcatgc tggagtcgcc gttccgcttc taccgcggtg cggcagcgat 2880
catggcggcg gacctggcac ccctgcccag cagcggactc caggtgcaat tgtgcgggga 2940
cgcgcacccg ttgaacttcc ggctcctggc ctcaccggag cgccggctgg tcttcgacat 3000
caacgacttc gacgagacgc tgcccggccc cttcgagtgg gacgtcaaac ggctggcggc 3060
cggattcgtg atcgcggccc ggtcgaacgg cttctcgtcc aaggaacaga accgcaccgt 3120
tcgggcctgt gtgcgggcct accgggagcg catgagggag ttcgccgtca tgccgaccct 3180
ggacatctgg tacgcccagg acgacgccga ccacgtacgg caactgctgg ctacggaggc 3240
cagaggagaa gctgagcagc ggctcaggga cgcggctgcg aaggcccgca cacgcaccca 3300
catgagggcg ttcgcgaagc tcacccgcgt cacggccgag ggccggcgca tcacccccga 3360
cccgccgctg atcaccccac tcggcgatct gctcaccgac ccggccgaag ccggccggga 3420
ggaggaactg cggtccgtcg tgaacggcta cgcacggtcc ctgccgcccg agcgccggca 3480
cctgctgcgt cactaccggc ttgtggacat ggcgcgcaag gtggtcggcg tcggcagtgt 3540
cggcacccgc tgctgggtac tgcttctgct cggcagggac gacgacgatc ctctgctgct 3600
ccaggccaag gaagcctcgg aatcggtgct ggcggcccac acgggcggcg aacgctacga 3660
ccatcagggc cgcagggtcg tggccggcca gcgtctgatc cagaccaccg gtgacatctt 3720
tctcggctgg gcgcgcgtca ccggcttcga cggaaaggcc cgggacttct acgtgcgtca 3780
actgtgggac tggaagggcg tcgcgcggcc ggaaaccatg gggcccgacc tgctctccct 3840
cttcgcccgg ctgtgcggtg cctgcctggc gagggcccac gcccgttccg gtgaccccgt 3900
cgcgctcgcc gcgtacctgg gcggcagcga ccgcttcgac ggcgcgctca ccgagttcgc 3960
ccagtcctac gccgatcaga atgaacgcga ccacgaagct ctgctggcgg cctgccgctc 4020
cggcagggtc acggccgccc gtttgtgagg ccgacccggg aacggccggc gggctggcac 4080
acaccgccgc cggtcggcgt cattccggaa gctgccgcat ctccaggacg cgcaggccca 4140
gcgactggca gcgggtgagc aacccgtaca gatgggcctc gtcgatcacc gtgccgaaca 4200
gcacggtctg gccggacatg acgacgtgct ccagctccgg gaacgcgttg gccagcgtcc 4260
gtgacaggtg tccctcgacg cggatctcgt agcgcacgag cggtcctttc accgtaggag 4320
ctcgggacac cgcccggggc tccgggtcgg acggtgctct tggtgacgag cctgcgcctc 4380
gtcgccctcc ggtgccctca cccagcacag gtgactccaa ccgcagtgtc agtgcctttc 4440
agtgcgtcac tgtgatcttg acgacgacga tcaccaggcc gagcagtacg ttgaccgtcg 4500
cggtgacggc caccagtcgt cgcgaggcgc ccgcgcggtg cgccgcggcg acggaccagc 4560
ccacctgacc ggcgacggcg acggacagcg ccagccacag ggtgcccggg acgtccagcc 4620
ccagtacggg gctgacggcg atggccgcgg ccggaggcac ggcggccttg acgatcggcc 4680
actcctcgcg gcacacacgc agaatcaccc gccggtccgg agtgtgccgc gcgagacgcg 4740
ctccgaacag ttcggcgtgg acgtgagcga tccagaacac caagctggtg agcaacagca 4800
gaagaaccag ttcggcgcgg gggaacgagc ccagggtgcc ggcgccgatc acgacggagg 4860
ctgcgagcat 4870
//