pkgdown/header.html

Skip to contents

R-CMD-check

Quantitative analysis of transcription initiation data

PRIME is an R package for quantitative analysis of transcription initiation data from CAGE and related TSS assays. It extends CAGEfightR with tools for regulatory element characterization, including transcription initiation complexity, signal saturation, background noise, divergent transcription, and core promoter architecture. PRIME also provides an interface to a LightGBM model for data-driven regulatory element scoring (via PRIMEmodel).

PRIME is designed to work alongside CAGEfightR and uses the same Bioconductor data structures (SummarizedExperiment / RangedSummarizedExperiment and GRanges). PRIME adds additional quantitative and modeling utilities while staying fully compatible with Bioconductor genomic infrastructure.

Core functionality

TSS-level quantification & processing

  • TSS-level quantification (via CAGEfightR::quantifyCTSSs())
  • expression summary, normalization, subsampling
  • transcription initiation metrics
    • complexity (e.g., dispersion/entropy-style summaries depending on analysis)
    • saturation / downsampling-based analyses
    • background noise estimation

Regulatory element characterization

  • divergent transcription detection and quantification
  • tag cluster analysis utilities (decomposition and downstream quantification)
  • strand balance and other profile-derived summaries (depending on workflow)

Core promoter analysis

  • promoter decomposition
  • positional dispersion summaries (e.g., width/dispersion-style measures)
  • initiator sequence patterns (INR-like classifications)

Profile-based analysis

  • signal aggregation around genomic features
  • window-based summarization and heatmap-style matrices

Machine learning interface

  • interfaces to score candidate regulatory elements using the PRIME LightGBM model (distributed via PRIMEmodel)

PRIME toolkit

The PRIME toolkit consists of three interconnected tools for the analysis of transcription initiation data (e.g., CAGE):

Tool Type Purpose
PRIMEprep Bash pipeline Raw FASTQ → QC → trimming → mapping → BigWig
PRIME R package TSS quantification, divergent loci, promoter decomposition, normalization, noise estimation
PRIMEmodel R package + Python Genome-wide prediction / scoring of regulatory elements

Input data

PRIME works with standard Bioconductor genomic data structures:

  • TSS-level data: typically a RangedSummarizedExperiment produced by CAGEfightR (row ranges are TSS positions; assays contain counts/TPM).
  • Tag clusters / loci / regions: typically GRanges (or RangedSummarizedExperiment objects where rowRanges() are regions).
  • PRIME functions are generally compatible with SummarizedExperiment + GRanges workflows and integrate with the Bioconductor ecosystem.

Getting started

PRIME is a toolbox (not a single rigid pipeline). The fastest way to get started is to follow the vignettes on the PRIME website:

Relationship to CAGEfightR

PRIME is designed to work alongside CAGEfightR. It uses the same data structures (SummarizedExperiment and GRanges) and extends CAGEfightR with additional quantitative and modeling utilities for regulatory element analysis.

CAGEfightR repository: https://github.com/MalteThodberg/CAGEfightR

PRIME model (PRIMEmodel)

  • PRIME (this package) is the analysis toolkit (TSS QC, quantification helpers, complexity/noise utilities, tag cluster and promoter analysis).
  • PRIMEmodel distributes the trained LightGBM model and provides genome-wide (or focal) scoring of candidate regulatory elements from TSS signal profiles.

PRIMEmodel repository: https://github.com/anderssonlab/PRIMEmodel
PRIMEmodel website: https://anderssonlab.org/PRIMEmodel/

Example applications

  • Compare transcription initiation complexity across conditions
  • Identify and characterize divergently transcribed loci
  • Analyze core promoter architecture via decomposition and dispersion measures
  • Perform profile-based analyses around genomic features (heatmap-style and window summaries)
  • Score candidate regulatory elements using the PRIME model (via PRIMEmodel)

Installation

PRIME depends on a mix of CRAN, Bioconductor, and GitHub packages. For detailed instructions, see: