I-SAGE Documentation
Welcome to the documentation for I-SAGE, a reproducible Nextflow-based pipeline for analyzing iBLESS sequencing data and performing genome-wide differential DNA double-strand break (DSB) analysis.
This documentation is intended for: - Computational biologists running iBLESS analyses - Developers extending or maintaining the pipeline - Reviewers seeking clarity on statistical methods and assumptions
The goal is to make every component of the pipeline transparent, reproducible, and interpretable.
What is I-SAGE?
I-SAGE is an end-to-end analysis framework that transforms raw iBLESS FASTQ files into: - Strand-aware break tracks - Normalized genome-wide signals - Statistically validated differential break calls - Robustness and sensitivity assessments
It is designed to support replication stress experiments, including treatments such as HU and aphidicolin, and to scale from single contrasts to replicate-aware, multi-parameter analyses.
Design Principles
I-SAGE is built around the following principles:
Reproducibility First
All steps are parameterized, logged, and traceable via Nextflow.
Explicit Statistics
Statistical tests, assumptions, and thresholds are documented and configurable.
Modularity
Each pipeline stage is implemented as an independent module.
Scientific Defensibility
Validation steps (downsampling, spike-ins, bin-size sensitivity) are part of the pipeline, not an afterthought.
Pipeline Overview
FASTQ
↓
Alignment & Deduplication
↓
Break Calling (per-base)
↓
Visualization Tracks (binned bedGraph)
↓
Normalization
↓
Differential Break Statistics
↓
Validation & Sensitivity Analyses
Each stage is described in detail in the following sections.
Documentation Structure
This documentation is organized as follows:
Getting Started
- Installation and requirements
- Quick-start examples
- Configuration basics
Pipeline Modules
- Alignment and deduplication
- Break calling
- Visualization and normalization
- Differential statistics
- Validation and sensitivity analysis
Statistical Methods
- Bin-level testing framework
- Replicate-aware analysis
- Multiple testing correction
- EBV annotation and enrichment
Configuration Guide
- Full explanation of
iblesse.yaml - Parameter interactions
- Recommended settings
Outputs & Interpretation
- Output directory structure
- TSV files and plots
- Genome browser tracks
- Common pitfalls in interpretation
Developer Guide
- Code structure
- Adding new modules
- Testing expectations
- Contribution workflow
Intended Usage
I-SAGE is suitable for: - Genome-wide DSB profiling under replication stress - Comparing treated vs control conditions - Assessing reproducibility across replicates - Sensitivity analyses across bin sizes or genomic regions
It is not intended to replace low-level exploratory notebooks, but to formalize analyses that must be re-run, reviewed, and trusted.
Project Status
The pipeline is under active development. Core functionality is stable; documentation is being expanded.
Major changes are tracked via Git commits and documented in the repository.
Getting Help
- For usage questions: consult the relevant documentation sections
- For bugs or feature requests: open a GitHub issue
- For design discussions: see
CONTRIBUTING.md
Citation
If you use I-SAGE in your work, please cite the relevant methodological references and acknowledge the pipeline.
(A formal citation entry will be added upon publication.)
Next: Proceed to Getting Started → Installation & Requirements