Overview
PRIMEprep is a shell-based pipeline that takes raw CAGE sequencing data (FASTQ) and produces strand-specific BigWig files ready for downstream analysis with CAGEfightR and PRIME.
The pipeline consists of 10 steps:
- FastQC — quality control of raw reads
- fastp — adapter trimming and quality filtering
- rRNAdust — ribosomal RNA filtering
- FastQC — quality control of trimmed reads
- STAR — genome mapping
- VCF-aware mapping (optional) — variant-aware re-mapping
- samtools index — BAM indexing
- preseq — library complexity estimation
- samtools stats — mapping statistics
- G-correction — removal of non-templated G additions at the 5′ end
The key output is strand-specific BigWig files in
bw_files/, which serve as direct input to PRIME.
Installation
Clone the PRIMEprep repository and ensure all external tools are installed:
Required tools (recommended versions):
| Tool | Version |
|---|---|
| STAR | v2.7.3a |
| fastp | v0.23.4 |
| samtools | v1.20.0 |
| rRNAdust | v1.02 |
| bedGraphToBigWig | v4.0 |
| preseq | v2.0 |
| FastQC | v0.12.1 |
| bedtools | v2.31.0 |
| Perl | v5.38.0 |
| openjdk | v20.0.0 |
Parameters
| Flag | Description |
|---|---|
-f |
Path to input FASTQ file(s) |
-g |
Path to reference genome FASTA |
-b |
Path to STAR genome index directory |
-t |
Number of threads |
-o |
Output directory |
-d |
Path to rRNAdust database |
-a |
Sequencing adapter sequence |
-v |
Path to VCF file for variant-aware mapping (optional) |
Output directories
After a successful run, the output directory contains:
| Directory | Contents |
|---|---|
QC/ |
FastQC reports (pre- and post-trimming) |
bam_files/ |
STAR-aligned, indexed BAM files |
bed_files/ |
Genomic coverage BED files |
bw_files/ |
Strand-specific BigWig files (key output for PRIME) |
The bw_files/ directory contains two BigWig files per
sample:
-
<sample>.plus.bw— 5′ coverage on the plus strand -
<sample>.minus.bw— 5′ coverage on the minus strand
These files are used directly as input to PRIME via
CAGEfightR::quantifyCTSSs(). See
vignette("03-ctss-processing") for the next step.
See also
-
vignette("01-getting-started")— installation and overview -
vignette("03-ctss-processing")— loading BigWig files into PRIME -
vignette("08-end-to-end-workflow")— complete pipeline walkthrough - PRIMEprep repository