I-SAGE Documentation

Welcome to the documentation for I-SAGE, a reproducible Nextflow-based pipeline for analyzing iBLESS sequencing data and performing genome-wide differential DNA double-strand break (DSB) analysis.

This documentation is intended for: - Computational biologists running iBLESS analyses - Developers extending or maintaining the pipeline - Reviewers seeking clarity on statistical methods and assumptions

The goal is to make every component of the pipeline transparent, reproducible, and interpretable.

What is I-SAGE?

I-SAGE is an end-to-end analysis framework that transforms raw iBLESS FASTQ files into: - Strand-aware break tracks - Normalized genome-wide signals - Statistically validated differential break calls - Robustness and sensitivity assessments

It is designed to support replication stress experiments, including treatments such as HU and aphidicolin, and to scale from single contrasts to replicate-aware, multi-parameter analyses.

Design Principles

I-SAGE is built around the following principles:

Reproducibility First
All steps are parameterized, logged, and traceable via Nextflow.

Explicit Statistics
Statistical tests, assumptions, and thresholds are documented and configurable.

Modularity
Each pipeline stage is implemented as an independent module.

Scientific Defensibility
Validation steps (downsampling, spike-ins, bin-size sensitivity) are part of the pipeline, not an afterthought.

Pipeline Overview

FASTQ
  ↓
Alignment & Deduplication
  ↓
Break Calling (per-base)
  ↓
Visualization Tracks (binned bedGraph)
  ↓
Normalization
  ↓
Differential Break Statistics
  ↓
Validation & Sensitivity Analyses

Each stage is described in detail in the following sections.

Documentation Structure

This documentation is organized as follows:

Getting Started

Installation and requirements
Quick-start examples
Configuration basics

Pipeline Modules

Alignment and deduplication
Break calling
Visualization and normalization
Differential statistics
Validation and sensitivity analysis

Statistical Methods

Bin-level testing framework
Replicate-aware analysis
Multiple testing correction
EBV annotation and enrichment

Configuration Guide

Full explanation of iblesse.yaml
Parameter interactions
Recommended settings

Outputs & Interpretation

Output directory structure
TSV files and plots
Genome browser tracks
Common pitfalls in interpretation

Developer Guide

Code structure
Adding new modules
Testing expectations
Contribution workflow

Intended Usage

I-SAGE is suitable for: - Genome-wide DSB profiling under replication stress - Comparing treated vs control conditions - Assessing reproducibility across replicates - Sensitivity analyses across bin sizes or genomic regions

It is not intended to replace low-level exploratory notebooks, but to formalize analyses that must be re-run, reviewed, and trusted.

Project Status

The pipeline is under active development. Core functionality is stable; documentation is being expanded.

Major changes are tracked via Git commits and documented in the repository.

Getting Help

For usage questions: consult the relevant documentation sections
For bugs or feature requests: open a GitHub issue
For design discussions: see CONTRIBUTING.md

Citation

If you use I-SAGE in your work, please cite the relevant methodological references and acknowledge the pipeline.

(A formal citation entry will be added upon publication.)

Next: Proceed to Getting Started → Installation & Requirements