[Bioperl-l] bp_genbank2gff3.pl error: "MSG: structure_type 2 is currently unknown"

Dave Clements clements at nescent.org
Wed Oct 29 17:53:31 EDT 2008


Hello all,

I'm trying to translate the threespine stickleback genome from Ensembl in
GenBank format (
ftp://ftp.ensembl.org/pub/current_genbank/gasterosteus_aculeatus/) into GFF3
format using the bp_genbank2gff3.pl script.  I get several data errors and
I've contacted Ensembl about some of them.

However, I also have a question about one of the errors.  I get this error
many times while parsing the files:
---
# working on region:scaffold:BROADS1:scaffold_180:1:137802:1,
Gasterosteus aculeatus, 30-JUN-2008, Gasterosteus aculeatus scaffold
scaffold_180 BROADS1 full sequence 1..137802 reannotated via EnsEMBL
scaffold:BROADS1:scaffold_180:1:137802:1 Unflattening error:
Details:
------------- EXCEPTION -------------
MSG: structure_type 2 is currently unknown
STACK Bio::SeqFeature::Tools::Unflattener::unflatten_seq
/usr/local/share/perl/5.8.8/Bio/SeqFeature/Tools/Unflattener.pm:1445
STACK (eval) /usr/local/bin/bp_genbank2gff3.pl:895
STACK main::unflatten_seq /usr/local/bin/bp_genbank2gff3.pl:894
STACK toplevel /usr/local/bin/bp_genbank2gff3.pl:411
-------------------------------------

# Possible gene unflattening error
withscaffold:BROADS1:scaffold_180:1:137802:1: consult STDERR
---

The code snippet that generates this message is:

 1432     # TYPE CONTAINMENT HIERARCHY (aka partonomy)
 1433     # set the containment hierarchy if desired
 1434     # see docs for structure_type() method
 1435     if ($structure_type) {
 1436         if ($structure_type == 1) {
 1437             $self->partonomy(
 1438                              {CDS => 'gene',
 1439                               exon => 'CDS',
 1440                               intron => 'CDS',
 1441                              }
 1442                             );
 1443         }
 1444         else {
 1445             $self->throw("structure_type $structure_type is currently
unknown");
 1446         }
 1447     }

I get this error if I specify --noCDS or --CDS.  I also get it if I parse
the EMBL format files instead.  However, if I specify "--filter exon
--filter mRNA" (I have to specify both) the errors go away.  According to
http://search.cpan.org/~birney/bioperl/Bio/SeqFeature/Tools/Unflattener.pm#structure_type(and
my copy of the PM), 0 and 1 are the only valid values for this.

However, $structure_type gets set by this chunk of code:

  1337             # Are there any mRNA features in the record?
  1338             if ($n_mrnas == 0) {
  1339                 # NO mRNAs:
  1340                 # looks like structure_type == 1
* 1341                 $structure_type = 1;
  1342                 $need_to_infer_mRNAs = 1;
  1343             }
  1344             elsif ($n_mrnas_attached_to_gene == 0) {
  1345                 # $n_mrnas > 0
  1346                 # $n_mrnas_attached_to_gene = 0
  1347                 #
  1348                 # The entries _do_ contain mRNA features,
  1349                 # but none of them are part of a group/gene, i.e.
they
  1350                 # are 'floating'
  1351
  1352                 # this is an annoying weird file that has some
floating
  1353                 # mRNA features;
  1354                 # eg
ftp.ncbi.nih.gov/genomes/Schizosaccharomyces_pombe/
  1355
  1356                 if ($self->verbose) {
  1357                     my @floating_mrnas =
  1358                       grep {$_->primary_tag eq 'mRNA' &&
  1359                               !$_->has_tag($group_tag)}
@flat_seq_features;
  1360                     printf STDERR "Unattached mRNAs:\n";
  1361                     foreach my $mrna (@floating_mrnas) {
  1362                         $self->_write_sf_detail($mrna);
  1363                     }
  1364                     printf STDERR "Don't know how to deal with these;
filter at source?\n";
  1365                 }
  1366
  1367                 foreach (@flat_seq_features) {
  1368                     if ($_->primary_tag eq 'mRNA') {
  1369                         # what should we do??
  1370
  1371                         # I think for pombe we just have to filter
  1372                         # out bogus mRNAs prior to starting
  1373                     }
  1374                 }
  1375
  1376                 # looks like structure_type == 2
* 1377                 $structure_type = 2;
  1378                 $need_to_infer_mRNAs = 1;
  1379             }
  1380             else {
  1381             }

I've attached a file containing only scaffold_180 (cleaned up some), but it
may be too big to make it through the list's filters.  If that happens the
files are at
ftp://ftp.ensembl.org/pub/current_genbank/gasterosteus_aculeatus/.
Scaffold_180 is in the "0" data file.   I've also appended the relevant
parts of the file at the end.

Can someone explain what the comments mean by:

  The entries _do_ contain mRNA features, but none of them are part of a
group/gene, i.e. they are 'floating'
  this is an annoying weird file that has some floating mRNA features;

The mRNAs all appear to have gene names associated with them.  What am I
missing?

Any ideas?

Thanks,

Dave C

LOCUS       scaffold_180 137802 bp DNA HTG 30-JUN-2008
DEFINITION  Gasterosteus aculeatus scaffold scaffold_180 BROADS1 full
sequence
            1..137802 reannotated via EnsEMBL
ACCESSION   scaffold:BROADS1:scaffold_180:1:137802:1
VERSION     scaffold_180BROADS1
KEYWORDS    .
SOURCE      three-spined stickleback
  ORGANISM  Gasterosteus aculeatus
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi;
            Actinopterygii; Neopterygii; Teleostei; Euteleostei;
Neoteleostei;
            Acanthomorpha; Acanthopterygii; Percomorpha; Gasterosteiformes;
            Gasterosteidae; Gasterosteus.
COMMENT     This sequence was annotated by the Ensembl system. Please visit
the
            Ensembl web site, http://www.ensembl.org/ for more information.
COMMENT     All feature locations are relative to the first (5') base of the
            sequence in this file.  The sequence presented is always the
            forward strand of the assembly. Features that lie outside of the
            sequence contained in this file have clonal location coordinates
in
            the format: <clone accession>.<version>:<start>..<end>
COMMENT     The /gene indicates a unique id for a gene,
            /note="transcript_id=..." a unique id for a transcript,
/protein_id
            a unique id for a peptide and note="exon_id=..." a unique id for
an
            exon. These ids are maintained wherever possible between
versions.
COMMENT     All the exons and transcripts in Ensembl are confirmed by
            similarity to either protein or cDNA sequences.
FEATURES             Location/Qualifiers
     source          1..137802
                     /organism="Gasterosteus aculeatus"
                     /db_xref="taxon:69293"
     gene            complement(1399..13644)
                     /gene=ENSGACG00000001596
                     /locus_tag="TOP1  (2 of 2)"
                     /note="DNA topoisomerase 1 (EC 5.99.1.2) (DNA
                     topoisomerase I).
[Source:Uniprot/SWISSPROT;Acc:P11387]"
     mRNA            join(complement(2230..2423),complement(1399..1718),
                     complement(3787..3936),complement(4166..4260),
                     complement(4370..4497),complement(5297..5451),
                     complement(5953..5999),complement(6147..6212),
                     complement(6228..6374),complement(6548..6582),
                     complement(6594..6760),complement(7035..7222),
                     complement(7299..7421),complement(7497..7662),
                     complement(7676..7735),complement(8704..8786),
                     complement(8863..8950),complement(9267..9302),
                     complement(9718..9792),complement(9899..9954),
                     complement(10037..10139),complement(10661..10751),
                     complement(13289..13313),complement(13612..13644))
                     /gene="ENSGACG00000001596"
                     /note="transcript_id=ENSGACT00000002089"
     CDS             join(
                     complement(2321..2423),
                     complement(3787..3936),complement(4166..4260),
                     complement(4370..4497),complement(5297..5451),
                     complement(5953..5999),complement(6147..6212),
                     complement(6228..6374),complement(6548..6582),
                     complement(6594..6760),complement(7035..7222),
                     complement(7299..7421),complement(7497..7662),
                     complement(7676..7735),complement(8704..8786),
                     complement(8863..8950),complement(9267..9302),
                     complement(9718..9792),complement(9899..9954),
                     complement(10037..10139),complement(10661..10751),
                     complement(13289..13313),complement(13612..13644))
                     /gene="ENSGACG00000001596"
                     /protein_id="ENSGACP00000002084"
                     /note="transcript_id=ENSGACT00000002089"
                     /db_xref="HGNC_curated_gene:TOP1  (2 of 2)"

/translation="MSGGHAHAHAQVNSGSKGSETHKHKEKHKEHRHKEHRKEKEREK

LKHSNSEHKDPAEKKLRDKQKLKHSNGSSEKPREKRREEKIQPSHVEKPKKEKENGFV

RERSPSALKSEPEEDNGFYPSPQHLNTCRAESAGRDVGLEYRPKKIKSEHDKKAKKRK

QEYEEDEEEDIKPKKKTRDQKATQGKKIKKEEEKWKCVCKERTETSRRHSLVCGPTFL

TPSWIVDLWDLFAGKPMKLKPPAEEVATFFAKMLDHEYTTKDIFRKNFFKDWRKEMTS

EEKSKLSDLNKCDFGEMSEYFKAQSEARKQMSKEEKQKLKEENERLLQEYGFCIMDNH

KERIGNFRIEPPGLFRGRGDHPKMGMLKRRIRPEDIIINCSKDSKQPKPPPGTKWKEV

RHDNKVTWLASWTENIQGSIKYIMLNPSSRIKVPCQHMTTEKKGSVSLMNNSLRWEPA

ALSLRAPGGVRFLLERFGLRIVSEQQLQRNSGFDENTFFFNKMDSWTIMAASYAGKEK

QNCCHKLHAEAEVYAGEQYLLCPRAPFESSSSPLQTSILNKHLQELMDGLTAKVFRTY

NASITLQQQLKELACPDDSLPAKVLSYNRANRAVAILCNHQRAPPKTFEKSMQNLQTK

IDEKQNQLSAARKQLKSAKAAHKTSHDDKSRKAWKVKRKAVQRIEEQLMKLQVQATDR

EENKQIALGTSKLNYLDPRISVAWCKKWAVPIEKIYNKTQREKFAWAIDMAEKDFEF"
     gene            complement(16523..17577)
                     /gene=ENSGACG00000001598
     mRNA            join(complement(16523..16551),
                     complement(16802..17577))
                     /gene="ENSGACG00000001598"
                     /note="transcript_id=ENSGACT00000002091"
     CDS             join(complement(16523..16551),
                     complement(16802..17399))
                     /gene="ENSGACG00000001598"
                     /protein_id="ENSGACP00000002086"
                     /note="transcript_id=ENSGACT00000002091"

/translation="MSPPPAPQVKGQPSPAPAVVSATADSHQSLVERTGQGPPGAVPP

QVLHPPAIQIEAIAPPTSAPAASNNITAPTASSPTPAASQVAVPTPIISQAPVPSTAA

ASNQAQAVAPQPPAVALAGASTSVAATLVSTAAPVQRPVPSVVPIVAGSGPSLEAVAT
                     TSSPVANPSGVPPAQPNPPAVERPMPPTAASAAITQTSPVSIQQAPPSQ"
     gene            complement(18492..25815)
                     /gene=ENSGACG00000001600
     mRNA            join(complement(18492..19760),
                     complement(19856..20080),
                     complement(20334..20468),
                     complement(20661..20713),
                     complement(20841..20959),
                     complement(21093..21501),
                     complement(21610..21727),
                     complement(21929..22470),
                     complement(23568..23708),
                     complement(23816..24424),
                     complement(24488..24643),
                     complement(24749..24877),
                     complement(24989..25111),
                     complement(25218..25373),
                     complement(25716..25815))
                     /gene="ENSGACG00000001600"
                     /note="transcript_id=ENSGACT00000002099"
     CDS             join(complement(18492..19760),
                     complement(19856..20080),
                     complement(20334..20468),
                     complement(20661..20713),
                     complement(20841..20959),
                     complement(21093..21501),
                     complement(21610..21727),
                     complement(21929..22470),
                     complement(23568..23708),
                     complement(23816..24424),
                     complement(24488..24643),
                     complement(24749..24877),
                     complement(24989..25111),
                     complement(25218..25373),
                     complement(25716..25815))
                     /gene="ENSGACG00000001600"
                     /protein_id="ENSGACP00000002094"
                     /note="transcript_id=ENSGACT00000002099"

/translation="SAVFIAFRGNMEDEDFSLKLDSILSGIPNMLDMASERLQPQHVE

PWNSVRVTFNIPRDAAERLRLLAQNNQQQLRDLGILSVQIEGEGAINVAVGPNRGQDV

RVNGPTGAPGQMRMDVGFSGQPGPGGVRMANPAMVPPGPGIAGQAMVPGSSGQMHPRI

QRPTSQTGSDGTDPMMAGMSVQQQQQPLQHQQAGPHVPGPMPQAAHHLQALQGGRPLN

PAAQAQLSQLGPRPPFNPSGQMAVPPGWNQLPSGVLQPPATQGSPAWRKPPPQAQMVP

RPPSLATVQTPSHPPPPYPFGSQQAGQVFNAIGQLQQQQQTGVGQFAAPQPKGLQTGP

GGVAGPPRPPPPLPPTSGPQGNLTAKSPGSSSSPFQQGSPGTPPMRPTTPQGFPQGVG

SPGRAALGQPGNMQQGFMGMPQHGQPGAQVHPVITGMPKRPMGFPNPNFVQGQVSGST

PGTPVGGASQQLQGNQAMTHTGALPSASTPNSMQGPPHAQPNVMGVQSGMAGLPPGTT

AGPSMGQQQPGLQTQMMGLQHQAQPVSSSPSQKVQGQGGGQTVLSRPLSQGQRGGMTP

PKQMMPQQGQGVMHGQGQMVGGQGHQAMLMQQQQQQNSMMEQMVANQMQGNKQPFGGK

IPAGVMPGQMMRGPAPNVPGNMVQFQGQQQHQQMNQQQPQQVPIAGNPNQAMGMHGQQ

LRLPAGHPLTAQQHPHPLGDPNGGTGDLGVQQMVPDMQAQQQQGMMGGPQHMQMGNGH

FAGHGMNFNSQFQGQMPMAGACVQPGGFPVSKDVTLTSPLLVNLLQSDISASQFGPGG

KQGAGGGNQAKPKKKKPARKKKSKEGDGPHGLDAAAGMEDSELPNLGGEQSLGLENSG

QKLPEFANRPAGFPGQAGDQRVLQQVPMQMQMQSLQNAQGPQGMTGPQAPGQGQPQMH

PHQLQQQPQQSNLLQQMLMMLKMQQEQAKNRMSIPPGGQIPPRGMGNPPEVQRLPVSQ

QNNMPVMISLPGHGGVPPSPDKARGMPLMVNPQLAGAVRRMSHPDAGQGLQGAGSEEA

IAHQKQPGGPDVGLQHPGNGNQQMMANQGSNAHMMKQGPGPSPMPQHTGASPQQQLPS

QPQQGGPMPGLHFPNVPTTSQSSRPKTPNRASPRPYHHPLTPTNRPPSTEPSEINLSP

ERLNASIAGLFPPKINIPLPPRQPNLNRGFDQQGLNPTTLKAIGQAPPSLTLPGNSNN

GSVGGNNNQQPFSTGSGVGGAGGKQDKQPGGQAKRASPSNSRRSSPASSRKSATPSPG

RQKGTKMAINCPPPQQQLVGSQAQTTMLSPASALPNPLSMPSQVSGAVEAQQTQSPFH

GMQGNAAEGIRESQGMATAEQRQVPQTPPQPLRELSAPRMASPRFPLPQQPKPDLEVK
                     AGTVDRLPVQTPPVPDSEASPTLRAAPTSLNQLLDNSAIANMPPRAGQNT"
     gene            complement(28650..36301)
                     /gene=ENSGACG00000001608
                     /locus_tag="GGTL3 (1 of 2)"
                     /note="Gamma-glutamyltransferase 4 precursor (EC
2.3.2.2)
                     (Gamma- glutamyltranspeptidase 4) (GGT 4) (Gamma-
                     glutamyltransferase-like 3) (Gamma-glutamyltransferase-
                     like 5) [Contains: Gamma- glutamyltransferase 4 heavy
                     chain; Gamma-glutamyltransferase 4 light chain
                     [Source:Uniprot/SWISSPROT;Acc:Q9UJ14]"
     mRNA            join(complement(28650..28810),
                     complement(29299..29398),
                     complement(29498..29635),
                     complement(29741..29855),
                     complement(30289..30441),
                     complement(30537..30625),
                     complement(31122..31249),
                     complement(31894..31981),
                     complement(32781..32974),
                     complement(33108..33181),
                     complement(33575..33642),
                     complement(34074..34191),
                     complement(34272..34423),
                     complement(34505..34731),
                     complement(35555..35616),
                     complement(35650..35756))
                     /gene="ENSGACG00000001608"
                     /note="transcript_id=ENSGACT00000002107"
     CDS             join(complement(28650..28810),
                     complement(29299..29398),
                     complement(29498..29635),
                     complement(29741..29855),
                     complement(30289..30441),
                     complement(30537..30625),
                     complement(31122..31249),
                     complement(31894..31981),
                     complement(32781..32974),
                     complement(33108..33181),
                     complement(33575..33642),
                     complement(34074..34191),
                     complement(34272..34423),
                     complement(34505..34731),
                     complement(35555..35616),
                     complement(35650..35756))
                     /gene="ENSGACG00000001608"
                     /protein_id="ENSGACP00000002102"
                     /note="transcript_id=ENSGACT00000002107"
                     /db_xref="HGNC_curated_gene:GGTL3 (1 of 2)"

/translation="RTEDKSANPETTLGSAYSPVDYMSITSFPRLPEDDKGDNTLKLR

KGEENALSEQDTDPDVFLKSAHLQRLPSSASDLASHEIASLRETRTDPFTEDCACQRD

GLTVIITAGLTFALGVTVALIMQIYLGPPQIFNQGAVVTDVAQCTSLGFDVLERQGSS

VDAAIAAALCLGIVHPHTSGIGGGGVMLVHNIRRNETRVIDFRETAPAAISEEMLLTK

LHLNPGLLVGVPGMLSGLHQAHQLYGRMPWKDVVTMAAEVARTGFNVTHDLAEALAKA

KDQNMSDAFGHLFLPDGQPPPSGLLTRRLDLAAILDAVASKGTSEFYSENLTREMAAA

VQAAGGVLTEEDFGNYSTVLQQPAEIIYQGHHVMAAPAPHAGIALIAALNILEGYNIT

SQVPRNSTYHWIAEALKISLALASGLGDPMYDTSISDVVAKMLSKSQASLLRQMINDS

QAFPVGHYAPSFTLETGAAAAQVMVMGPDDHIVSVMSSLNKPFGSGIVTPSGILLNSQ

ILDFSWPNKTRGSSPNPHNSLQPGKRPMSFLMPTAVRPAVGLCGTYVAVGSSDGEKAL

SGITQVLMNVLSSRKNMSDSLAYGRLHPHLLPNMLLVDSEFEDEDVELLQAKGHKVER
                     RDVLSLVEGTRRTNDLIIGVKDPRSADASALTMS"
     mRNA            join(complement(29318..29398),
                     complement(29498..29635),
                     complement(29741..29855),
                       complement(30289..30441),
                     complement(30537..30625),
                     complement(31122..31249),
                     complement(31894..31981),
                     complement(32781..32974),
                     complement(33108..33181),
                     complement(33575..33642),
                     complement(34074..34191),
                     complement(34272..34423),
                     complement(34505..34731),
                     complement(35555..35849),
                     complement(36242..36301))
                     /gene="ENSGACG00000001608"
                     /note="transcript_id=ENSGACT00000002113"
     CDS             join(complement(29318..29398),
                     complement(29498..29635),
                     complement(29741..29855),
                     complement(30289..30441),
                     complement(30537..30625),
                     complement(31122..31249),
                     complement(31894..31981),
                     complement(32781..32974),
                     complement(33108..33181),
                     complement(33575..33642),
                     complement(34074..34191),
                     complement(34272..34423),
                     complement(34505..34731),
                     complement(35555..35690))
                     /gene="ENSGACG00000001608"
                     /protein_id="ENSGACP00000002108"
                     /note="transcript_id=ENSGACT00000002113"

/translation="MSITSFPRLPEDDNAAAAAAPAPAPGDNTLKLRKGEENALSEQD

TDPDVFLKSAHLQRLPSSASDLASHEIASLRETRTDPFTEDCACQRDGLTVIITAGLT

FALGVTVALIMQIYLGPPQIFNQGAVVTDVAQCTSLGFDVLERQGSSVDAAIAAALCL

GIVHPHTSGIGGGGVMLVHNIRRNETRVIDFRETAPAAISEEMLLTKLHLNPGLLVGV

PGMLSGLHQAHQLYGRMPWKDVVTMAAEVARTGFNVTHDLAEALAKAKDQNMSDAFGH

LFLPDGQPPPSGLLTRRLDLAAILDAVASKGTSEFYSENLTREMAAAVQAAGGVLTEE

DFGNYSTVLQQPAEIIYQGHHVMAAPAPHAGIALIAALNILEGYNITSQVPRNSTYHW

IAEALKISLALASGLGDPMYDTSISDVVAKMLSKSQASLLRQMINDSQAFPVGHYAPS

FTLETGAAAAQVMVMGPDDHIVSVMSSLNKPFGSGIVTPSGILLNSQILDFSWPNKTR

GSSPNPHNSLQPGKRPMSFLMPTAVRPAVGLCGTYVAVGSSDGEKALSGITQVLMNVL
                     SSRKNMSDSLAYGRLHPHLLP"
     gene            53987..55637
                     /gene=ENSGACG00000001618
                     /locus_tag="SNAI1  (2 of 2)"
                     /note="Zinc finger protein SNAI1 (Protein snail homolog
1)
                     (Protein sna). [Source:Uniprot/SWISSPROT;Acc:O95863]"
     mRNA
join(53987..54136,54343..54534,54562..54707,54756..54915,
                     55148..55314,55605..55637)
                     /gene="ENSGACG00000001618"
                     /note="transcript_id=ENSGACT00000002120"
     CDS
join(54052..54136,54343..54534,54562..54707,54756..54915,
                     55148..55314,55605..55637)
                     /gene="ENSGACG00000001618"
                     /protein_id="ENSGACP00000002115"
                     /note="transcript_id=ENSGACT00000002120"
                     /db_xref="HGNC_curated_gene:SNAI1  (2 of 2)"

/translation="MPRSFLVKKYFSNRKPSWDRDSQLESQAAFVPESFAQAELPTQN

GSFALTCYPTGPSFSGVGVLPAPLSPIAPASPSPSPLGPLDLSSAPSSNGGRTSDPPS

PDVVQHAFHCLRCTSSYSSLSALSHHQASHHQASQRARQRPAFHCKHCPKEYTSLGAL

KMHIRSHTLPCVCPTCGKAFSRPWLLRGHIRTHTGERPFACQHCNRAFADRSNLRAHL
                     QKHPEVKKYQCGSCSRTFSRMFLLLNTAPPGAGVCAPLRGNIQ"
     mRNA
join(54052..54136,54343..54535,54563..54731,54756..54915,
                     55148..55290,55293..55316,55420..55428)
                     /gene="ENSGACG00000001618"
                     /note="transcript_id=ENSGACT00000002124"
     CDS
join(54052..54136,54343..54535,54563..54731,54756..54915,
                     55148..55290,55293..55316,55420..55428)
                     /gene="ENSGACG00000001618"
                     /protein_id="ENSGACP00000002119"
                     /note="transcript_id=ENSGACT00000002124"

/translation="MPRSFLVKKYFSNRKPSWDRDSQLESQAAFVPESFAQAELPTQN

GSFALTCYPTGPSFSGVGVLPAPLSPIAPASPSPSPLGPLDLSSAPSSSGGRTSDPPS

PDVVQHAFHCLRCTSSYSSLSALSHHQASHHQASQRARQQHSSPLPPRPAFHCKHCPK

EYTSLGALKMHIRSHTLPCVCPTCGKAFSRPWLLRGHIRTHTGERPFACQHCNRAFAD
                     RSNLRAHLQKHPEVKKYQCGSCSRTFSRMFLLQHSASGCCPPC"
     gene            69424..106380
                     /gene=ENSGACG00000001624
     mRNA            join(69424..70049,70523..70546,105631..105758,
                     106305..106380)
                     /gene="ENSGACG00000001624"
                     /note="transcript_id=ENSGACT00000002125"
     CDS             69516..69824
                     /gene="ENSGACG00000001624"
                     /protein_id="ENSGACP00000002120"
                     /note="transcript_id=ENSGACT00000002125"

/translation="MKRLKNLIMLTIDLTKIPSQRRSLPLLTRGRFVRRPQAFLAAFV

VVWPDCRRVQSSEDPSIAARSLHLNICFKGCDRRREHDLLHLISNKTNIKKGKTKTKC
                     L"
     gene            complement(69458..72266)
                     /gene=ENSGACG00000001627
     mRNA            join(complement(69458..70045),
                     complement(70528..70553),
                     complement(71520..71631),
                     complement(72179..72266))
                     /gene="ENSGACG00000001627"
                     /note="transcript_id=ENSGACT00000002128"
     CDS             complement(69515..69880)
                     /gene="ENSGACG00000001627"
                     /protein_id="ENSGACP00000002123"
                     /note="transcript_id=ENSGACT00000002128"

/translation="MRYSGPFACVFKLFYYNTHKHLVLVLPFLMLVLFDIKCNRSCSL

LLSQPLKHIFKCKLRAAMDGSSLLCTRRQSGQTTTNAAKNACGRRTKRPRVSNGSERR
                     WEGIFVKSIVNIIKFLSLFI"
     gene            complement(74610..77849)
                     /gene=ENSGACG00000001630
                     /locus_tag="SAMHD1  (2 of 3)"
                     /note="SAM domain and HD domain-containing protein 1
                     (Dendritic cell-derived IFNG-induced protein) (DCIP)
                     (Monocyte protein 5) (MOP-5).
                     [Source:Uniprot/SWISSPROT;Acc:Q9Y3Z3]"
     mRNA            join(complement(74610..74684),
                     complement(74761..74861),
                     complement(74977..75120),
                     complement(75749..75819),
                     complement(75907..76022),
                     complement(76434..76594),
                     complement(77706..77849))
                     /gene="ENSGACG00000001630"
                     /note="transcript_id=ENSGACT00000002132"
     CDS             join(complement(74612..74684),
                     complement(74761..74861),
                     complement(74977..75120),
                     complement(75749..75819),
                     complement(75907..76022),
                     complement(76434..76594),
                     complement(77706..77738))
                     /gene="ENSGACG00000001630"
                     /protein_id="ENSGACP00000002127"
                     /note="transcript_id=ENSGACT00000002132"
                     /db_xref="HGNC_curated_gene:SAMHD1  (2 of 3)"

/translation="MAGRPSDLLGKVFNDPIHGHMEMHPLLIRIIDTPQFQRLRHIKQ

LGGVYFVFPGASHNRFEHSLGVAHLAGELVRDLKQRQPDLNITDRDVLCVQIAGLCHD

LGHGPFSHMFDGMFIPKARPGLTWKHEKASVEMFDHLVADNDLKPVMKEHGLKLPEDL

VFIKELMDPKDPKDPWSYKGRLENKSFLYEIVSNKRNAIDVDKWDYFARDCYHLGIKN
                     NFDHGRCLMFARVCE"
     gene            complement(90678..100224)
                     /gene=ENSGACG00000001632
                     /locus_tag="SAMHD1 (1 of 3)"
                     /note="SAM domain and HD domain-containing protein 1
                     (Dendritic cell-derived IFNG-induced protein) (DCIP)
                     (Monocyte protein 5) (MOP-5).
                     [Source:Uniprot/SWISSPROT;Acc:Q9Y3Z3]"
     mRNA            join(complement(90678..91320),
                     complement(91530..91667),
                     complement(91829..91933),
                     complement(92020..92107),
                     complement(92441..92582),
                     complement(93438..93553),
                     complement(93628..93719),
                     complement(94565..94673),
                     complement(94750..94850),
                     complement(94942..95100),
                     complement(95645..95715),
                     complement(95803..95918),
                     complement(98829..98989),
                     complement(99304..99376),
                     complement(99751..99817),
                     complement(99921..100224))
                     /gene="ENSGACG00000001632"
                     /note="transcript_id=ENSGACT00000002142"
     CDS             join(complement(91198..91320),
                     complement(91530..91667),
                     complement(91829..91933),
                     complement(92020..92107),
                     complement(92441..92582),
                     complement(93438..93553),
                     complement(93628..93719),
                     complement(94565..94673),
                     complement(94750..94850),
                     complement(94942..95100),
                     complement(95645..95715),
                     complement(95803..95918),
                     complement(98829..98989),
                     complement(99304..99376),
                     complement(99751..99817),
                     complement(99921..100089))
                     /gene="ENSGACG00000001632"
                     /protein_id="ENSGACP00000002136"
                     /note="transcript_id=ENSGACT00000002142"
                     /db_xref="HGNC_curated_gene:SAMHD1 (1 of 3)"

/translation="MASRKRSFPPDSSLSAPGKRAPGPGAPQTDYAGWGAEETCRYLR

AEGLGEWEDAFREHRITGVGLRYLADADLEKMGLKFLGDRLRVLHSLRTLWQIEVEPS

KVFNDPIHGHMEMHPLLIRIIDTPQFQRLRHIKQLGGAYFVFPGASHNRFEHSLGVGH

LAGQLVRALDQRQPELHITRRDVLCVQIAGLCHDLGHGPFSHMFDGKFIPKARPGFTW

KHEDASVKMFDHLVADNDLQPVMKEHGLVLPEDLDFIKEQIAGPMDPKDMKKLEWPYR

GRPKDKSFLYEIVSNKRNGIDVDKWDYFARDCYHLGIKNNFDYGRCLMFAKVCEVDGQ

KHICTRDKEVGNLYDMFHTRNCLHRRAYQHKVAKIVETMITEAFLKADGHILFEGSKG

KMFSLSTAIDDMEAYTKVTVDNVFEQILNSSSAALKDSREILKNVVCRRLYKCLGHTQ

ADQHENVPQKERIASWEADLARCASQDVVLNPEDFIIDVINLDYGMKEKNPINSVRFY

SKDDPSKAVQIRKNQVSKLLPEQFAEQLIRVYCKKLDSRSLEAAKKNFVQWCMDENFS
                     KPQDGDIIAPELTPLKPSRQEDDDNNKKEVNPVGKARIQLFER"
     gene            complement(105654..110675)
                     /gene=ENSGACG00000001639
                     /locus_tag="SAMHD1   (3 of 3)"
                     /note="SAM domain and HD domain-containing protein 1
                     (Dendritic cell-derived IFNG-induced protein) (DCIP)
                     (Monocyte protein 5) (MOP-5).
                     [Source:Uniprot/SWISSPROT;Acc:Q9Y3Z3]"
     mRNA            join(complement(105654..105755),
                     complement(106302..106406),
                     complement(106493..106528),
                     complement(106762..106946),
                     complement(107965..108080),
                     complement(108157..108248),
                     complement(108694..108802),
                     complement(108879..109033),
                     complement(109155..109214),
                     complement(109846..109916),
                     complement(110004..110119),
                     complement(110524..110675))
                     /gene="ENSGACG00000001639"
                     /note="transcript_id=ENSGACT00000002145"
     CDS             join(complement(105654..105755),
                     complement(106302..106406),
                     complement(106493..106528),
                     complement(106762..106946),
                     complement(107965..108080),
                     complement(108157..108248),
                     complement(108694..108802),
                     complement(108879..109033),
                     complement(109155..109214),
                     complement(109846..109916),
                     complement(110004..110119),
                     complement(110524..110675))
                     /gene="ENSGACG00000001639"
                     /protein_id="ENSGACP00000002139"
                     /note="transcript_id=ENSGACT00000002145"
                     /db_xref="HGNC_curated_gene:SAMHD1   (3 of 3)"

/translation="DPIHGHMEMHPLLIRIIDTPQFQRLRRIKQLGGAYFVFPGASHN

RFEHSLGVAHLAGKLVRALDQRQGDLHIDDRDVLCVQIAGLCHDLGHGPFSHMFDGKF

IPKARPGFTWKHEKASVEMFDHLVADNDLQPNDHVVLFVPDVTVVSSPQWPYRGRLEN

KSFLYEIVSNKRNCIDVDKWDYFARDCYHLGIKNNFDHGRCLMFARVCEVDGQKQICF

RDKEVEDLYDMFYTRICLHRRAYQHKAANIVETMITEAFWKADGHIEFEGSGGQKFKL

SDTIKDMEAYTKVTDDVFEKILNSSSDELKDSREILQDVVCRRIYKCIGQAQPTQPTT

VTVSVIIFSYFTLEKLEADVVLNPEDFIIDVINLDYGMKEENPIDRVRFYSKDDPDKG
                     FQIPQNQVFGFLPEKFTKELIRVYCKKLDSESLKAAKDNFK"
     exon            complement(10037..10139)
                     /note="exon_id=ENSGACE00000016865"
     exon            complement(8863..8950)
                     /note="exon_id=ENSGACE00000016876"
     exon            complement(7299..7421)
                     /note="exon_id=ENSGACE00000016885"
     exon            complement(9899..9954)
                     /note="exon_id=ENSGACE00000016866"
     exon            complement(5297..5451)
                     /note="exon_id=ENSGACE00000016904"
     exon            complement(4370..4497)
                     /note="exon_id=ENSGACE00000016905"
     exon            complement(7676..7735)
                     /note="exon_id=ENSGACE00000016881"
     exon            complement(7035..7222)
                     /note="exon_id=ENSGACE00000016889"
     exon            complement(5953..5999)
                     /note="exon_id=ENSGACE00000016902"
     exon            complement(6228..6374)
                     /note="exon_id=ENSGACE00000016896"
     exon            complement(4166..4260)
                     /note="exon_id=ENSGACE00000016911"
     exon            complement(1399..1718)
                     /note="exon_id=ENSGACE00000016924"
     exon            complement(10661..10751)
                     /note="exon_id=ENSGACE00000016863"
     exon            complement(6548..6582)
                     /note="exon_id=ENSGACE00000016893"
     exon            complement(13289..13313)
                     /note="exon_id=ENSGACE00000016862"
     exon            complement(9267..9302)
                     /note="exon_id=ENSGACE00000016873"
     exon            complement(9718..9792)
                     /note="exon_id=ENSGACE00000016870"
     exon            complement(13612..13644)
                     /note="exon_id=ENSGACE00000016858"
     exon            complement(3787..3936)
                     /note="exon_id=ENSGACE00000016915"
     exon            complement(6147..6212)
                     /note="exon_id=ENSGACE00000016899"
     exon            complement(6594..6760)
                     /note="exon_id=ENSGACE00000016891"
     exon            complement(2230..2423)
                     /note="exon_id=ENSGACE00000016920"
     exon            complement(8704..8786)
                     /note="exon_id=ENSGACE00000016877"
     exon            complement(7497..7662)
                     /note="exon_id=ENSGACE00000016884"
     exon            complement(16523..16551)
                     /note="exon_id=ENSGACE00000016940"
     exon            complement(16802..17577)
                     /note="exon_id=ENSGACE00000016938"
     exon            complement(23568..23708)
                     /note="exon_id=ENSGACE00000016969"
     exon            complement(18492..19760)
                     /note="exon_id=ENSGACE00000017004"
     exon            complement(20661..20713)
                     /note="exon_id=ENSGACE00000016989"
     exon            complement(21093..21501)
                     /note="exon_id=ENSGACE00000016980"
     exon            complement(19856..20080)
                     /note="exon_id=ENSGACE00000016996"
     exon            complement(24749..24877)
                     /note="exon_id=ENSGACE00000016956"
     exon            complement(25218..25373)
                     /note="exon_id=ENSGACE00000016951"
     exon            complement(25716..25815)
                     /note="exon_id=ENSGACE00000016949"
     exon            complement(21929..22470)
                     /note="exon_id=ENSGACE00000016971"
     exon            complement(24989..25111)
                     /note="exon_id=ENSGACE00000016953"
     exon            complement(20841..20959)
                     /note="exon_id=ENSGACE00000016984"
     exon            complement(20334..20468)
                     /note="exon_id=ENSGACE00000016991"
     exon            complement(24488..24643)
                     /note="exon_id=ENSGACE00000016961"
     exon            complement(23816..24424)
                     /note="exon_id=ENSGACE00000016965"
     exon            complement(21610..21727)
                     /note="exon_id=ENSGACE00000016974"
     exon            complement(36242..36301)
                     /note="exon_id=ENSGACE00000017125"
     exon            complement(29741..29855)
                     /note="exon_id=ENSGACE00000017086"
     exon            complement(34272..34423)
                     /note="exon_id=ENSGACE00000017035"
     exon            complement(33575..33642)
                     /note="exon_id=ENSGACE00000017051"
     exon            complement(35650..35756)
                     /note="exon_id=ENSGACE00000017021"
     exon            complement(29318..29398)
                     /note="exon_id=ENSGACE00000017135"
     exon            complement(34074..34191)
                     /note="exon_id=ENSGACE00000017044"
     exon            complement(29299..29398)
                     /note="exon_id=ENSGACE00000017095"
     exon            complement(30537..30625)
                     /note="exon_id=ENSGACE00000017071"
     exon            complement(33108..33181)
                     /note="exon_id=ENSGACE00000017056"
     exon            complement(31894..31981)
                     /note="exon_id=ENSGACE00000017062"
     exon            complement(35555..35849)
                     /note="exon_id=ENSGACE00000017127"
     exon            complement(35555..35616)
                     /note="exon_id=ENSGACE00000017028"
     exon            complement(31122..31249)
                     /note="exon_id=ENSGACE00000017066"
     exon            complement(29498..29635)
                     /note="exon_id=ENSGACE00000017091"
     exon            complement(28650..28810)
                     /note="exon_id=ENSGACE00000017102"
     exon            complement(34505..34731)
                     /note="exon_id=ENSGACE00000017031"
     exon            complement(32781..32974)
                     /note="exon_id=ENSGACE00000017060"
     exon            complement(30289..30441)
                     /note="exon_id=ENSGACE00000017076"
     exon            55148..55290
                     /note="exon_id=ENSGACE00000017228"
     exon            54343..54535
                     /note="exon_id=ENSGACE00000017218"
     exon            55293..55316
                     /note="exon_id=ENSGACE00000017233"
     exon            54563..54731
                     /note="exon_id=ENSGACE00000017224"
     exon            55605..55637
                     /note="exon_id=ENSGACE00000017197"
     exon            54756..54915
                     /note="exon_id=ENSGACE00000017188"
     exon            54343..54534
                     /note="exon_id=ENSGACE00000017167"
     exon            53987..54136
                     /note="exon_id=ENSGACE00000017156"
     exon            54052..54136
                     /note="exon_id=ENSGACE00000017212"
     exon            55148..55314
                     /note="exon_id=ENSGACE00000017193"
     exon            54562..54707
                     /note="exon_id=ENSGACE00000017179"
     exon            55420..55428
                     /note="exon_id=ENSGACE00000017240"
     exon            106305..106380
                     /note="exon_id=ENSGACE00000017258"
     exon            69424..70049
                     /note="exon_id=ENSGACE00000017248"
     exon            105631..105758
                     /note="exon_id=ENSGACE00000017255"
     exon            70523..70546
                     /note="exon_id=ENSGACE00000017250"
     exon            complement(70528..70553)
                     /note="exon_id=ENSGACE00000017275"
     exon            complement(69458..70045)
                     /note="exon_id=ENSGACE00000017281"
     exon            complement(72179..72266)
                     /note="exon_id=ENSGACE00000017268"
     exon            complement(71520..71631)
                     /note="exon_id=ENSGACE00000017272"
     exon            complement(75907..76022)
                     /note="exon_id=ENSGACE00000017299"
     exon            complement(74977..75120)
                     /note="exon_id=ENSGACE00000017305"
     exon            complement(74761..74861)
                     /note="exon_id=ENSGACE00000017308"
     exon            complement(76434..76594)
                     /note="exon_id=ENSGACE00000017296"
     exon            complement(75749..75819)
                     /note="exon_id=ENSGACE00000017303"
     exon            complement(77706..77849)
                     /note="exon_id=ENSGACE00000017294"
     exon            complement(74610..74684)
                     /note="exon_id=ENSGACE00000017310"
     exon            complement(98829..98989)
                     /note="exon_id=ENSGACE00000017342"
     exon            complement(93438..93553)
                     /note="exon_id=ENSGACE00000017375"
     exon            complement(90678..91320)
                     /note="exon_id=ENSGACE00000017391"
     exon            complement(94750..94850)
                     /note="exon_id=ENSGACE00000017366"
     exon            complement(99921..100224)
                     /note="exon_id=ENSGACE00000017324"
     exon            complement(99751..99817)
                     /note="exon_id=ENSGACE00000017329"
     exon            complement(94565..94673)
                     /note="exon_id=ENSGACE00000017369"
     exon            complement(91530..91667)
                     /note="exon_id=ENSGACE00000017386"
     exon            complement(95645..95715)
                     /note="exon_id=ENSGACE00000017355"
     exon            complement(91829..91933)
                     /note="exon_id=ENSGACE00000017381"
     exon            complement(92020..92107)
                     /note="exon_id=ENSGACE00000017379"
     exon            complement(99304..99376)
                     /note="exon_id=ENSGACE00000017335"
     exon            complement(94942..95100)
                     /note="exon_id=ENSGACE00000017360"
     exon            complement(92441..92582)
                     /note="exon_id=ENSGACE00000017377"
     exon            complement(93628..93719)
                     /note="exon_id=ENSGACE00000017372"
     exon            complement(95803..95918)
                     /note="exon_id=ENSGACE00000017347"
     exon            complement(110004..110119)
                     /note="exon_id=ENSGACE00000017407"
     exon            complement(110524..110675)
                     /note="exon_id=ENSGACE00000017404"
     exon            complement(107965..108080)
                     /note="exon_id=ENSGACE00000017418"
     exon            complement(106493..106528)
                     /note="exon_id=ENSGACE00000017422"
     exon            complement(108157..108248)
                     /note="exon_id=ENSGACE00000017414"
     exon            complement(108694..108802)
                     /note="exon_id=ENSGACE00000017413"
     exon            complement(109846..109916)
                     /note="exon_id=ENSGACE00000017408"
     exon            complement(109155..109214)
                     /note="exon_id=ENSGACE00000017410"
     exon            complement(106762..106946)
                     /note="exon_id=ENSGACE00000017420"
     exon            complement(105654..105755)
                     /note="exon_id=ENSGACE00000017425"
     exon            complement(106302..106406)
                     /note="exon_id=ENSGACE00000017424"
     exon            complement(108879..109033)
                     /note="exon_id=ENSGACE00000017412"
     misc_feature    1..5487
                     /note="contig contig_13399 1..5487(1)"
     misc_feature    5852..7735
                     /note="contig contig_13400 1..1884(1)"
     misc_feature    8660..14728
                     /note="contig contig_13401 1..6069(1)"
     misc_feature    15200..39327
                     /note="contig contig_13402 1..24128(1)"
     misc_feature    46327..47864
                     /note="contig contig_13403 1..1538(1)"
     misc_feature    50118..51320
                     /note="contig contig_13404 1..1203(1)"
     misc_feature    53911..55318
                     /note="contig contig_13405 1..1408(1)"
     misc_feature    55419..56091
                     /note="contig contig_13406 1..673(1)"
     misc_feature    56664..57183
                     /note="contig contig_13407 1..520(1)"
     misc_feature    57284..83877
                     /note="contig contig_13408 1..26594(1)"
     misc_feature    83978..137802
                     /note="contig contig_13409 1..53825(1)"
BASE COUNT  32610 a 28563 c 28961 g 33195 t 14473 n
ORIGIN
        1 GGTTTACCTC CCGGGGGGGG GGCGACACGG CGGAGTTGCC CCCCCGGAGG GAACCAGCCG

--
Fill out the the GMOD Community Survey NOW and win some GMOD Gear:
http://gmod.org/wiki/GMOD_News#2008_GMOD_Community_Survey
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scaffold_180.genbank
Type: application/octet-stream
Size: 213889 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20081029/1ac1ae41/attachment-0001.obj>


More information about the Bioperl-l mailing list