The beginning is the end
How promoters predefine where genes end
Each gene in our DNA has a beginning and an end. Defining the gene’s extremities properly is crucial to produce functional protein. A lot of research has been done on finding out what determines when, where, and at which site on the DNA a gene “starts”. But where a gene ends, is a different story: selection of transcription termination sites has been assumed to depend on downstream elements and extrinsic factors. In their most recent study published in the scientific journal Cell, researchers from the Max Planck Institute of Immunobiology and Epigenetics made the surprising finding that for most of our genes, the site of transcription start determines the site of transcription end. This phenomenon is well-conserved across species and pre-determines mRNA end sites at the very beginning of transcription, and plays a crucial role in cell identity and functionality.
All cells in an organism contain identical DNA sequence. What determines the identity and function of individual cells and tissues, is the set of genes that will be active in a given place, at a given time. These active genes are transcribed from the DNA template into distinct messenger RNA (mRNA) molecules and will encode the proteins the cell needs to function.
At specific places called promoters, a complex molecular machinery starts transcribing DNA sequences into mRNA. Interestingly, most genes contain multiple possible sites where transcription can start or end. This means that for each gene, depending on the start or termination site, the mRNAs can be different. Expressing one gene in different variants expands the diversity and functionality of the genome many times over. At the same time, it adds another layer of complexity to the study the genome.
RNA snapshots from beginning to end
Scientists at the Max Planck Institute of Immunobiology and Epigenetics in Freiburg wanted to know how many different start and end sites each gene uses, in which combination, and whether the combinations were different in different conditions. “The technical problem to answer this question is that we have to “read” each and every mRNA molecule from all genes from the very beginning to the very end. This a humongous task that has not been undertaken before,” says Valérie Hilgers, a research group leader at the MPI-IE and member of the Cluster of Excellence CIBSS – Centre for Integrative Biological Signalling Studies at the University of Freiburg.
The scientists used a tweaked next-generation sequencing technology to read out the individual mRNAs. For conventional short-read sequencing, each mRNA is broken into shorter fragments that are amplified and then sequenced to produce the “read”. Bioinformatic techniques are then used to piece together the reads like a jigsaw, into a continuous sequence. For full-length mRNA information of the entire genome in several Drosophila tissues, including the brain, the Hilgers teamed up with the Deep Sequencing Facility of the MPI to optimize specific long-read-sequencing technologies. “Long-read sequencing allows for the retrieval of much longer sequencing reads than widely used standard sequencing. However, we even had to optimize this technology and increase the typical read length by several fold to obtain full-length mRNA information in our different model systems,“ says Carlos Alfonso-Gonzalez, the first author of the publication. In addition to Drosophila, the Hilgers Lab also included a human model of the nervous system into their study: cerebral organoids – “mini-brains” cultured in a dish from induced pluripotent stem cells.
Transcription end sites are pre-determined at transcription start
The gathered data representing each mRNA at the full-molecule scale give unprecedented insight into the transcription of individual genes “We realized that far from start sites (TSSs) and end sites (TESs) being randomly combined one to another, we found that often, sites of transcription start are specifically linked to distinct sites of transcription end”, says Valérie Hilgers. This linkage is actually causal: in ovaries, for example, the artificial activation of a TSS that is normally only used in the brain overrides the normal TES and artificially induced the use of the brain TES. This shows the critical role of TSS in shaping the RNA landscape unique to each tissue, and thereby influencing tissue identity.
Promoter dominance drives RNA diversity, gene function and tissue identity
However, one phenomenon stood out. “Certain TSSs show unexpected dominance behavior. They overrule conventional signals to end transcription, outcompete other TSSs, and cause the selection of distinct TESs. Accordingly, we named them »dominant promoters«,” says Carlos Alfonso-Gonzalez. Furthermore, the team found that interactions between these dominant promoters and their associated gene ends was guided by distinct epigenetic signatures. Importantly, the results in Drosophila brain cells could be replicated in the human brain organoids, showing that promoter dominance is a conserved, perhaps universal, mechanism for regulating the production of functional proteins and the cells’ functionality.
What could be the physiological relevance of this novel mechanism? Through an in-depth sequence conservation analysis, the Freiburg researchers discovered that TSSs and TESs exhibit co-evolution: over millions of years of evolution between species, individual nucleotide changes in the gene start at dominant promoters were accompanied by changes at the corresponding gene end. “We interpret this observation as a “push” through evolution, to sustain the interaction between both extremities of the gene, which implies significant importance of these couplings for animal fitness,” says Valérie Hilgers.
Teaser image info: An artist’s representation of different start sites that open doors to entirely different worlds. Iimage generated by MPI-IE PR office through AI generation using midjourney // Creative Commons Noncommercial 4.0 Attribution International