How did eukaryotic organisms become so much more complex than prokaryotic ones, without a whole lot more genes? The answer lies in transcription factors.
Do complex organisms have more genes than simpler organisms?Now that researchers cansequence whole genomes and have done so for a number of organisms, they know that many vertebrates have onlyabout twice as many genes as invertebrates, and many of these are the result ofduplication of existing genes rather than development of new ones. Butif there are not that many new genes, what is responsible for the incrediblediversity in plant and animal species?
The simple answer to this question is that eukaryotes havedeveloped a more complex way of controlling expression of their existing genesthan prokaryotes. Thissystem of expression control relies on a group of proteins known astranscription factors (TFs), and it allows eukaryotes to alter theircell types and growth patterns in a variety of ways. TFs are not solely responsible for gene regulation; eukaryotes also relyon cell signaling, RNAsplicing, siRNAcontrol mechanisms,and chromatinmodifications.However, TFs that bind to cis-regulatorDNA sequences are responsible for either positively or negatively influencingthe transcription of specific genes, essentially determining whether aparticular gene will be turned "on" or "off" in anorganism.
Transcription Factors Recognize Specific DNA Sequences
Figure 1:Solution structure of the core NFATC1-DNA complex.
Topological representation of secondary structure elements in the complex between the NFATC1 transcription factor and its 12-base-pair binding sequence in DNA. The NFATC1-DNA complex shows that NFATC1 is a ten-stranded antiparallel beta-barrel. The two primary sheets (beta-IHFCE and beta-ABG) that form the core of the beta-barrel lie remote from the DNA interface and are almost completely unaffected by being bound to DNA. The third sheet (beta-DG), which does not contact DNA directly but adjoins and abuts multiple segments that do, is also very similar in the free protein and binary complex. The most radical changes that occur upon binding to DNA involve two large surface loops.
© 1998 Elsevier Zhou, P. et al. Solution structure of the core NFATC1/DNA complex. Cell 92, 687–696 (1998). All rights reserved.
Much of thecomplexity in differentiation in animal and plant cells can be attributed to theevolution of elaborate systems made up of short (6 to 8 base pair) cis-regulatory DNA sequences or motifs, aswell as the TFs that bind to the motifs, interact with each other to formcomplexes, and recruit RNA polymerase II (Levine & Tjian, 2003). Mosteukaryotic genes have promoters that consist of the TATA box close to the 5'end of the gene and, farther upstream, several motifs recognized by specifictranscription factors. In addition, many genes have one or more other nearby sequencescalled enhancers. Enhancers affect transcription;these sequences occur upstream, downstream, or within introns, and they continueto work whether in the normal orientation or turned backward in the genome. In yeast, noenhancers are known; instead, there are only upstream activator sequences(UASs). Enhancers can be found thousands of base pairs from a promoter, whereasUASs are generally within a few hundred base pairs upstream. Typical RNApolymerase II promoters can be influenced by many enhancers and by multiplefactors bound to the promoter and enhancer sequences.
The mode of actionof TFs is to recognize and bind to a segment of DNA in the promoter and/orenhancer region. Often, a change in the conformation, or three-dimensional structureof a TF, will accompany DNA binding. For example, the two loops in NFATC1 that interactwith DNA are found in different conformations, depending on whether NFATC1 iscomplexed with DNA or not (Figure 1). Moreover,the structure of different TF families, described later in this article, resultsin specific areas in these protein complexes that interact with the DNArecognition motif. The recognition motif is usually only about 6 to 10 basepairs long.
Experiments have shown that TFs can bind tightly, both within cells and in vitro. AfterTFs bind to promoter or enhancer regions of the DNA, they interact with otherbound TFs and recruit RNA polymerase II. Their influence, however, can beeither positive or negative, depending on the presence of other functionaldomains on the protein and the overall impact of the entire TF complex. Atypical TF has multiple functional domains, not only for recognizing andbinding to the appropriate DNA strand, but also for interactions with other TFs,with proteins called coactivators, with RNA polymerase II, with chromatinremodeling complexes, and with small noncoding RNAs.
TFs control many importantparts of development; therefore, organisms with a deletion of a TF gene exhibitprofound irregularities in organization and development (Table 1). For example,in Drosophila, deletion of the TF antennapedia gene results in thedevelopment of the antennal imaginal disc into legs rather than antennae.
Table 1: Effects of Some Transcription Factor (TF)Gene Deletions in Drosophila
TF Gene Deleted | Gene Group | Type of TF | Phenotypic Effects Observed |
Buttonhead | Gap | Zinc finger | Lack of mandibular, intercalary, and antennal head segments |
Hairy | Pair rule | bHLH | Ectopic expression of bristles on legs and wings |
Antennapedia | Homeotic | Homeobox | Legs on the head where antennae should be |
Transcription Factors Exert Combinatorial Control
Many TFs are knownto facilitate transcription at hundreds of different promoters, while some areonly active at a select few. Laboratory techniques such as chromatin immunoprecipitation (ChIP) and DNAmicroarraysare commonly used to study the target DNA motifs recognized by individual TFs(Iyer et al., 2001). Signal moleculescan influence activation by TFs by covalently binding or modifying theirfunctional domains. It is even possible for a TF to respond to a physicalsignal, such as red or far-red light, but the signal must be transduced to achemically modified activator that interacts with the TF.
The complexity andfine gradations of DNA expression in eukaryotes result from combinatorics, inthat the combination of chromatin and TF signals, rather than the individual TFsignal, is read out. Thus, transcriptional control is dependent on theinteractions of all the TFs and whether they attract RNA polymerase or block itfrom initiating transcription. Multiple TFs can accumulate, creating a bulk thesize of a ribosome. Once bound together, changes to the functional domains of aTF and/or covalent interactions with other factors can turn transcription on oroff, depending on whether they allow or prohibit the recruitment of RNApolymerase.
A typical enhancercan be up to 500 base pairs in length and contain multiple binding sites for atleast two or three different TFs (Levine & Tijan, 2003). Two TFs bound atsites near one another on the DNA strand can combine to form a dimer and bendthe DNA in what is believed to be part of the activation process. Chromatinstructure allows activators to associate with one another, even when they arebound to DNA sequences many hundreds of base pairs apart. Some TFs are believedto act as tethering elements between distant enhancers and promoters by formingconnections with other proteins.
The Evolution of Transcription Factor Families
Higher organismshave a large number of diverse TF families defined by the sequence of theirDNA-binding domains. Evolutionary studies have shown that although theDNA-binding motif is highly conserved among plants and animals, the remainderof these organisms' protein sequences is often very different. In addition, aparticular TF family may have different roles in plants than in animals, andsome new TFs have evolved in each kingdom since their divergence.
In many animals,including humans, a prominent group of genes involved in cell development,including many that encode TFs, contain a 180 base-pair sequence called thehomeobox. The homeobox encodes a 60-amino acid protein segment called thehomeodomain, which recognizes and binds to promoters in the DNA of its targetgenes. Complete control over transcription, and sometimes binding, is dependenton interactions between TFs, so activation often depends upon the presence of anotherTF. A similar system of gene recognition is found in plants, where theDNA-binding domain is called the MADS box.
TFs often havecertain specific DNA-binding motifs, a common one being the basic helix-loop-helix(bHLH) structure that recognizes a specific sequence of DNA and sits on the DNAlike a train car on a track. One such example is the TF MyoD (myoblastdetermination). Expression of the MyoDgene results in production of MyoD protein, which binds to the promoters ofmuscle-determining genes, causing the differentiation of muscle precursor cells(myoblasts) into muscle fibers. MyoD also binds to its own promoter, thusmaintaining its own levels in differentiated muscle cells and their progeny.
In addition to bHLH,there are some other common structural motifs for recognition and binding ofDNA, and these are found in most regulatory proteins. These are the helix-turn-helix,zinc finger, and leucine zipper (Figure 2).The NFATC1 example shown in Figure 1 is known as a β-barrel. Proteins having each of these motifsare effective because they fit neatly into the major or minor grooves of theDNA strand, and also because they expose specific amino acids at theappropriate places to form hydrogen bonds with the nucleotide bases. Moleculargenetic techniques can be used to change any amino acids to test whether thisaffects the binding affinity of the TF for the target.
Complexity and Transcription Factors
Complexity of transcriptional control can be illustrated bycomparing the number and locations of cis-controlelements in higher and lower eukaryotes. For instance, Drosophila typically has several enhancers for a single gene of 2 to 3kilobases, scattered over a large (10 kilobase) region of DNA, while, as describedearlier, yeast have no enhancers but instead use one UAS sequence per gene,located upstream. Long-range regulation is thought to be indicative of the needfor a higher level of control over genes involved in cell development anddifferentiation.
The yeast genome encodes around 300 TFs, or one perevery 20 genes, while humans express approximately 3,000 TFs, or one per every 10genes. With combinatorial control, the twofold increase in TFs per geneactually translates into many more possible combinations of interactions,allowing for the dramatic increase in diversity among organisms. When weconsider the additional complexities of chromatin remodeling, regulated mRNAstability, and translational control, it is easier to understand how the cellsof higher organisms can produce such an enormous variety of genetic responsesto environmental signals.
Conclusion
The cells of higher organisms exhibit an incredible number of genetic responses to their environment. This is largely theresult of TFs that govern the way genes are transcribed and RNA polymerase IIis recruited. Through these mechanisms, TFs control important aspects oforganismal development. In addition, by working in combination with chromatin,TF signals can exert a finer level of control over DNA by allowing forgradations of expression. TF families further increase the level of geneticcomplexity in eukaryotes, and many TFs within the same family often worktogether to affect transcription of a single gene. Given the function of TFs,along with other mechanisms of eukaryotic gene regulation, it is not surprisingthat complex organisms are capable of doing so much with so few genes. It isthese processes, more than the number of genes, that separate complex andsimple organisms from each other from a genetic standpoint.
References and Recommended Reading
Chen, K., & Rajewsky,N. The evolution of gene regulation by transcription factors and microRNAs. Nature Reviews Genetics 8, 93–103(2007) doi:10.1038/nrg1990 (link to article)
Hochschild, A., et al. Repressor structure and the mechanism of positivecontrol. Cell 32, 319–325(1983) doi:10.1016/0092-8674(83)90451-8
Iyer, V., et al. Genomic binding sites of theyeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001) doi:10.1038/35054095 (link to article)
Levine, M., & Tjian,R. Transcription regulation and animal diversity. Nature 424, 147–151 (2003) doi:10.1038/nature01763 (link to article)
Sadava, D., et al. Life: The Science of Biology (Gordonsville, VA, W. H. Freeman, 2006)