Emerging similarities between regulatory elements

Mammalian transcription is controlled by a complex interplay of regulatory events. Together, these events determine the correct spatio-temporal initiation and rate of RNA polymerase II (RNAPII) gene transcription.

Several decades of research on transcriptional regulation have identified different modes of regulation that have been ascribed to distinct regulatory elements – stretches of DNA with specific information and genomic locations needed for their tasks. Core promoters are located at transcription initiation sites and carry binding sites to general transcription factors that help recruit and assemble the RNAPII machinery. The activation, rate of initiation and elongation are further influenced by promoter-distal regulatory elements called enhancers – elements that are generally believed to be the main determinants of cell-type-specific and precise developmental gene expression. Silencers – elements that bind repressors that block transcription – or insulators – boundary elements that block the interplay between promoters and enhancers or silencers, further add to the complex landscape of transcriptional regulation.

Incorrect spatio-temporal regulation of gene expression may lead to disease. This is one of the main motivations behind increased efforts to characterize the locations and cell-type-specific activities of regulatory elements in mammalian genomes. While the genomic locations of gene promoters may be inferred using various RNA sequencing techniques, other kinds of regulatory elements have been harder to localize. Major consortia such as the The Encyclopedia of DNA Elements (ENCODE) and the NIH Roadmap Epigenomics Mapping Consortium have therefore invested a considerable amount of time and resources to delineate rules on how to localize and infer the activities of regulatory elements.

However, recent studies have generated data that question the fundamental separation of regulatory elements into distinct entities. Transcription initiation occurs at enhancers (see also related post). This property does not only enable precise localization of enhancers; it is also a good predictor of their regulatory activities. Hence, the activity of an enhancer may be inferred by a property that is generally ascribed promoters. Adding to their similarities, gene promoters can have enhancer activities and a considerable fraction of DNA sequences with enhancer potential overlap core promoters in Drosophila.

I discuss the similarities between enhancers and promoters in a recent opinion paper [1]. Together with Albin Sandelin and Charles Danko, we take this one step further and suggest a unified architecture of transcriptional regulatory elements [2]. I am now happy to see that also Tae-Kyung Kim and Ramin Shiekhattar recognize the similarities between enhancers and promoters and the problems with their distinctions [3].

Similarities between regulatory elements have been discussed also elsewhere in the literature. Jesse Raab and Rohinton Kamakaka discuss the apparent similarities between insulators and promoters [4]. Alexander Feuerborn and Peter Cook suggest a unifying view of regulatory elements and that regulatory function is determined by the three-dimensional structure of chromatin and selective tethering of transcription units to transcription factories [5].

Taken together, recent observations call for a reconsideration of current discriminatory rules of regulatory elements. In light of recent data, regulatory elements should not generally be considered distinct entities but rather elements with varying functions. The functions of regulatory elements seem to be context dependent and determined by the physical proximity of other regulatory elements and their bound factors.

  1. Andersson R. Promoter or enhancer, what’s the difference? Deconstruction of established distinctions and presentation of a unifying model. Bioessays. 2015 Mar;37(3):314-23.
  2. Andersson R, Sandelin A, Danko CG. A unified architecture of transcriptional regulatory elements. Trends Genet. 2015 Aug;31(8):426-33.
  3. Kim T-K, Shiekhattar R. Architectural and Functional Commonalities between Enhancers and Promoters. Cell. 2015;162(5): 948-59.
  4. Raab JR, Kamakaka RT. Insulators and promoters: closer than we think. Nat Rev Genet. 2010 Jun;11(6):439-46.
  5. Feuerborn A, Cook PR. Why the activity of a gene depends on its neighbors. Trends Genet. 2015 Aug 7.

Identification of bidirectionally transcribed loci and prediction of enhancers [repost]

[ The following is a repost from anderssonr.wordpress.com ]

FANTOM5 (Functional Annotation of the Mammalian Genome) is an international research consortium established by Dr. Hayashizaki and his colleagues at RIKEN in Tokyo, Japan. Founded in 2000 to functionally annotate the mouse DNA sequence with advanced sequencing techniques, FANTOM has since developed and expanded over time to encompass the regulation of genes, networks of genes and their impact in disease. The FANTOM project includes over 500 scientists from more than 20 countries over the whole world.

In FANTOM5 we have used Cap Analysis of Gene Expression (CAGE) to map the sets of transcripts, transcription factors, promoters and enhancers active in the majority of mammalian primary cell types. We have also complemented this with profiles from cancer cell lines, and tissues. The results are published in two articles in Nature describing the promoterome (FANTOM Consortium et al. 2014) and enhancerome (Andersson et al. 2014) of mammalian cells along with several more focused papers in various journals.

In this post, I will explain the computational strategy I used in the enhancer paper (Andersson et al. 2014) for predicting the locations of transcriptional enhancers and quantifying their usage across 808 human FANTOM CAGE libraries.

Bidirectional (divergent) transcription at enhancers

We observed that enhancers, as defined from chromatin features (H3K4me1 and H3K27ac, see e.g. Bulger and Groudine 2011 for an overview of these features), were bidirectionally transcribed producing capped RNAs emanating (divergently) outwards from the center nucleosome deficient region (NDR) (Figure 1A). The observation of bidirectional transcription at enhancers is not new. Tae-Kyung Kim and colleagues (Kim et al. 2010) observed bidirectional transcription at active enhancers in mouse cortical neurons and coined the products eRNAs (enhancer RNAs).

Enhancer transcription as a marker of regulatory activity

Figure 1: Enhancer transcription as a marker of regulatory activity. A, Enhancers identified by chromatin marks were overlaid with CAGE data, revealing a bidirectional transcription pattern. B, Density plot illustrating the difference in directionality of transcription at transcription start sites of protein-coding genes and center positions of chromatin-defined enhancers. C, Success rates of in vitro enhancer assays in HeLa cells. Vertical axis shows the fraction of active enhancers (success defined by Student’s t-test, P<0.05, vs. random regions). Numbers of successful assays are shown on the respective bar. Figures are modified, with permission, from Andersson et al. © (2014) Macmillan Publishers Ltd. All rights reserved.

Although functional roles have been suggested for enhancer RNAs (see Lam et al. 2014 for an extensive review), such attribution remains debatable. Nevertheless, the production of eRNAs does provide insight into functional regulatory elements.

We found that the characteristics of enhancer transcription, detected using CAGE are sufficiently distinct from those of gene promoters to permit accurate genome-wide inference of enhancers from eRNAs – while transcription is mainly unidirectional at mRNA promoters, enhancers initiate bidirectional transcription (Figure 1B). Importantly, by in vitro assays, we showed that enhancer transcription is a much better predictor of enhancer activity than chromatin characteristics (3-fold increase in validation rate) (Figure 1C). These observations constitute the fundamental basis of my approach to infer the genomic locations of putative enhancers genome-wide.

Identification of bidirectionally transcribed loci

The computational strategy to predict enhancer locations is made available on Github. Below, I describe the procedure.

Identification of bidirectionally transcribed loci

Figure2: Identification of bidirectionally transcribed loci

Bidirectionally transcribed loci were defined from a set of 1,714,047 forward and 1,597,186 reverse strand CAGE tag clusters (TCs) supported by at least two CAGE tags in at least one sample (TCs defined in FANTOM Consortium et al. 2014). Only TCs not overlapping antisense TCs were used. The identification of bidirectional loci involves the following steps:

  1. We identified 1,261,036 divergent (reverse-forward) TC pairs separated by at most 400 bp (step 1 in Figure 2)
  2. We merged all such pairs containing the same TC, while at the same time avoiding overlapping forward and reverse strand transcribed regions (prioritization by expression ranking), which resulted in 200,171 bidirectional loci (step 2 in Figure 2). A center position was defined for each bidirectional locus as the mid position between the rightmost reverse strand TC and leftmost forward strand TC included in the merged bidirectional pair.
  3. Each bidirectional locus was further associated with two 200 bp regions immediately flanking the center position, one (left) for reverse strand transcription and one (right) for forward strand transcription, in a divergent manner. The merged bidirectional pairs were further required to be bidirectionally transcribed (CAGE tags supporting both windows flanking the center) in at least one individual sample, and to have a greater aggregate of reverse CAGE tags (over all FANTOM5 samples) than forward CAGE tags in the 200 bp region associated with reverse strand transcription, and vice versa. These filtering steps resulted in 78,555 bidirectionally transcribed loci.

We quantified the expression of bidirectional loci for each strand and 200 bp flanking window in each of the 808 FANTOM libraries separately by counting the CAGE tags whose 5′ ends were located within these windows. The expression values of both flanking windows were normalized by converting tag counts to tags per million mapped reads (TPM). The normalized expression values from both windows were used to calculate a sample-set wide directionality score, D, for each enhancer over aggregated normalized reverse, R, and forward, F, strand expression values across all samples (Figure 2);

D = (F-R) / (F+R).

D ranges between -1 and 1 and specifies the bias in expression to reverse and forward strand, respectively (D=0 means 50% reverse and 50% forward strand expression, while abs(D) close to 1 indicates unidirectional transcription). Each bidirectional locus was assigned one expression value for each sample by summing the normalized expression of the two flanking windows.

Prediction of enhancers from bidirectionally transcribed loci

Bidirectional loci were filtered to have low, non-promoter-like, directionality scores (abs(D) < 0.8, Figure 1B) and to be located distant to TSSs and exons of protein- and non-coding genes. This resulted in a final set of 43,011 putative enhancers.

The predicted enhancers and the expression of enhancers across FANTOM5 libraries are available at http://enhancer.binf.ku.dk (direct link to BED file: here).

Each predicted enhancer is described in BED12 format with two blocks denoting the merged regions of transcription initiation on the minus and plus strands. The thickStart and thickEnd columns denote the inferred mid position (Figure 2). The score column gives the maximum pooled expression of TCs used to construct each bidirectional loci.