Congratulations to the team in the Lander and Guttman labs for three of our major papers (all published in the last few weeks):
This paper was a phenomenal collaboration with a graduate student in the lab, Charlie Fulco. We sought to address a fundamental challenge in modern biology: to understand the regulatory wiring that connects noncoding regulatory elements to specific target genes. These connections are typically studied on a one-by-one basis — by knocking out individual sequence elements and determining their effects on gene expression — but there are potentially millions of regulatory elements and we lack a unifying framework to predict their functions.
We developed an approach based on CRISPR interference that can simultaneously assess megabases of sequence in a single experiment — the scale needed to comprehensively define all of the elements that regulate a gene of interest in a given cell type. Applying this technique revealed complex networks connecting multiple enhancers with multiple target genes. These data allowed us to derive a model that could accurately predict gene-enhancer connections in the MYC locus based on chromatin state alone. This method will be a key tool for interpreting disease-associated human genetic variation and manipulating gene expression for therapeutic purposes.
This paper is the culmination of my PhD thesis with Eric Lander, and has important implications for understanding the variety of different mechanisms that can contribute to the regulatory wiring described above. Specifically, we found that many sequence signals involved in gene regulation are hiding in unexpected places: many gene promoters act as DNA elements to regulate a neighboring gene, and sequences involved in transcription and RNA processing (e.g., 5′ splice sites) of one gene can also regulate a neighboring gene.
These findings indicate that many genes have dual functions: they produce an RNA (e.g., a lncRNA or mRNA); and, in doing so, they can regulate the expression of a neighboring gene. Thus, the expression of a one gene is often directly tied to the expression of its neighbor. This observation has important implications for the functions of lncRNAs — more (opinionated) commentary to come!
This review with Mitch Guttman and Noah Ollikainen focuses on the functions and mechanisms of a subset of lncRNAs that function as regulatory RNAs in the nucleus. Such lncRNAs have a unique capability that distinguishes them from protein regulators or DNA regulatory elements: they can spatially amplify regulatory information encoded by DNA. Unlike proteins, lncRNAs can act in close proximity to their site of transcription; and unlike DNA regulatory elements, lncRNAs can amplify DNA-encoded regulatory signals to different extents according to their expression levels. Furthermore, lncRNAs are not necessarily restricted by topological constraints of the chromatin fibre, allowing them to diffuse to or mediate contacts at spatially proximal sites that might even reside on different chromosomes. This properties may explain the molecular functions of lncRNAs like XIST, FIRRE, and others.
Our bioRxiv pre-print for a new story is available here.
This story emerged from our efforts to understand whether there are general principles that might explain the functions of many lncRNAs. Through genetic dissection of 12 genomic loci that produce lncRNAs, we found that many regulate the expression of a neighboring gene. Surprisingly, these functions did not require the specific lncRNA transcripts themselves and instead involved mechanisms associated with lncRNA transcription and production. These mechanisms appear to be general properties of genes, both coding and noncoding: loci encoding mRNAs also frequently regulate a neighboring gene. These mechanisms may explain the function and evolution of many genomic loci that encode lncRNAs, and further suggest that the connectivity of local gene regulatory networks is more complex than we anticipated.
In a Perspective in Science, Peter Fraser wrote a very nice summary to our work. However, I thought it might be helpful to back up even further and introduce the problem we tackled.
Introduction: I am currently a PhD student at MIT in Health Sciences and Technology. My goal, shared by much of the biomedical community, is to understand how our bodies work on a molecular level so that we can rationally design drugs to improve human health. I work on a very early step of this process, which is to understand the molecular components of our cells and how they interact with each other – the wiring of the cell, if you will.
This wiring, it turns out, is extraordinarily complex – so complex, in fact, that we don’t even have a list of all of the wires. I study a class of molecules in the cell called large (or long) non-coding RNAs (lncRNAs, pronounced link-RNAs). LncRNAs are large (>200 RNA bases), 5′ capped, and often spliced; in many respects, they look like the messenger RNAs (mRNAs) that transmit genetic information from DNA to protein (see above). However, lncRNAs do not code for protein and are thought to play functional or structural roles in the cell as RNA molecules. Although we’ve known about a handful of lncRNAs for more than twenty years, we discovered in the years following the Human Genome Project that there are in fact thousands of different lncRNAs encoded in our genomes. Subsequent functional studies determined that many of these lncRNAs play crucial roles in the cellular circuitry, similar to protein-coding genes. In other words, we know lncRNAs are important because when you delete one, the mouse dies.
When I started my PhD, we had pinpointed thousands of new lncRNAs in our genomes and realized that many of them may be critical to human biology and medicine. The next question that we set out to tackle was: How do lncRNAs work?
Methods: To answer this question, we needed a method to examine lncRNAs in their native cellular environment. If lncRNAs are in fact controlling cellular processes, they must be interacting with other components in the cell, such as other RNA molecules, proteins, or even specific regions of a chromosome. If we could identify these interacting components, we could use this information to figure out how lncRNAs achieve their various functions. To accomplish this, we would need to purify a lncRNA from cells and identify the other cellular components that co-purified with our target lncRNA.
This experiment would be straightforward if we were trying to discover the molecular components that interact with an uncharacterized protein: we would raise an antibody to the protein of interest, and use these antibodies to bind and capture the protein from cell extracts. However, this method does not work for lncRNAs because antibodies cannot recognize them.
To solve this problem, we developed a method called RNA Antisense Purification (RAP). RAP takes advantage of the fact that we know the sequence of our target lncRNAs; we simply design other nucleic acid molecules with a complementary sequence that will hybridize to and capture our target lncRNA. Although conceptually simple, developing this protocol was very challenging! I won’t go into the details here; if you want to learn more, visit our RAP web page.
Results: As an initial foray into studying the molecular mechanisms of lncRNAs, we decided to test our method on the canonical lncRNA Xist (pronounced “exist”), which is one of the few lncRNAs that has been studied for a long time. Xist was discovered over twenty years ago because it is responsible for silencing one of the two X-chromosomes in female mammals in a process called “dosage compensation”; females silence one of their two X-chromosomes in order to balance the expression of X-chromosome genes with male mammals, who have one X-chromosome and one Y-chromosome. The gene encoding the Xist RNA is located on the X-chromosome is activated randomly on one X-chromosome early during embryonic development. The Xist RNA spreads out from its gene locus to cover the X-chromosome, turns off gene expression, and packages up the X-chromosome DNA into an inactive bundle visible under a light microscope (the “Barr Body“).
Despite this knowledge about Xist, we did not know exactly where on the X-chromosome Xist actually bound, and we did not understand how Xist spread across and accessed the entire chromosome. We used RAP to purify the Xist RNA and its associated cellular components, and we sequenced the DNA that was bound in a complex with the Xist RNA. Unexpectedly, we found that Xist bound broadly across the entire X-chromosome, which differed dramatically from the pattern of binding of a similar lncRNA (roX2) in fruit flies. We also discovered that Xist spreading takes advantage of the three-dimensional conformation of the X-chromosome; rather than binding to specific high-affinity sites across the X-chromosome, Xist simply reaches out in three dimensions from its gene locus and spreads to these nearby sites (which may not be close in linear sequence).
This last point is worth considering in more detail, because it in fact takes advantage of a special property of a lncRNA: a lncRNA can function at its site of production in the nucleus, allowing it to use its specific position in the genome to target nearby sites in three dimensions. This function is unique to lncRNAs as opposed to proteins: proteins are produced in the cytoplasm from messenger RNAs and thus have no memory of where their gene is located in the nucleus.
Whew! To summarize, we demonstrated an important ability of Xist that might apply to many other lncRNAs. We are now in the process of studying many of the hundreds of other lncRNAs encoded in our genomes to identify additional mechanisms by which lncRNAs control cellular processes. In the future we will be able to use this information to develop drugs that target specific lncRNAs or interfere with their interactions with other cellular components.
Edit: For a more whimsical summary of our paper, visit Sick Papes: http://sickpapes.tumblr.com/day/2013/07/09
My talk at the Biology of Genomes meeting at Cold Spring Harbor (May 2013) was recently featured in Science. The talk was titled “Large noncoding RNAs can localize to regulatory DNA targets by exploiting the three-dimensional architecture of the genome”, and featured work done in collaboration with Mitch Guttman, Eric Lander, and the lncRNA team at the Broad Institute, as well as with Amy Pandya-Jones and Kathrin Plath at UCLA. Read more about RNA Antisense Purification (RAP).