// Synthetic immune repertoires with absolute ground truth
Simulate immune receptor sequences. Know exactly what the engine did.¶
GenAIRR is a high-performance simulator for adaptive immune receptor repertoires (BCR + TCR). A Rust simulation kernel drives a fluent Python DSL; every record comes back annotated with by-construction truth (V/D/J calls, junction, productive flag, identity, mutation counts) derived from the persistent intermediate representation, not inferred by an aligner.
What it's for. A DSL for AIRR. An engine that knows the answer.¶
// 01
Benchmark¶
Score alignment, clustering, annotation, or genotype-inference tools (TIgGER, IgDiscover) against a ground truth the engine emitted by construction, not an oracle aligner that can be wrong.
// 02
Null models¶
Generate biologically grounded null repertoires with controllable productive fraction, SHM rate, V-gene usage, and clonal structure for statistical hypothesis testing.
// 03
Phenomena lab¶
Switch any biological mechanism on or off (P/N additions, D inversion, receptor revision, targeted SHM) and observe the downstream effect on what aligners report.
// Engine Rust kernel · 23 species, 106 bundled configs · Constraint-aware sampling · Byte-stable replay
What GenAIRR does. Seven capabilities. One composable engine.¶
// V(D)J · 01
Recombine¶
Sample alleles, trim, fill NP1/NP2, assemble. Authored typed empirical models per cartridge plane.
// Diploid · 02
Genotype¶
Per-individual diploid germline: phased V(D)J from one chromosome, zygosity, gene deletion, novel alleles, and multi-subject cohorts.
// SHM · 03
Mutate¶
Uniform or S5F context-aware SHM. Per-segment + per-V-subregion rate targeting (FWR/CDR).
// Lineage · 04
Simulate clones¶
BCR lineage trees, TCR clone-size repertoires, and flat abundance benchmarks with planted clone IDs.
// Library · 05
Corrupt¶
Primer trimming, structural indels, PCR errors, N-base injection. Each knob tunable per workload.
// Contracts · 06
Constrain¶
Productive-only sampling at compile time. The engine never proposes out-of-frame or stop-bearing candidates.
// Replay · 07
Replay¶
Trace every random draw. Byte-identical replay across runs, platforms, and one-knob-changed counterfactuals.
import GenAIRR as ga
# Grow real BCR clonal lineage trees - affinity maturation, with ground truth
result = (
ga.Experiment.on("human_igh")
.recombine()
.clonal_lineage(n_clones=50, max_generations=6, n_sample=30,
rate=0.01, selection_strength=10.0)
.sequencing_errors(rate=0.001)
.run_records(seed=42)
)
result.to_tsv("repertoire.tsv") # per-cell AIRR records (clone_id, lineage_*)
newick = result.lineage_trees[0].to_newick() # ground-truth lineage tree per clone
Install. One command. No compiler.¶
pip install GenAIRR
Pre-built wheels for Linux (x86_64, aarch64), macOS (Intel + Apple Silicon), and Windows (x64). Python 3.9+. No Rust toolchain needed.
See the full quick start → Start the getting-started track →
Choose your path¶
A few of the most common starting points - the Choose your path page has the full set (eight paths), each expanded into a short ordered reading list.
| If you want to ... | Start here |
|---|---|
| Simulate sequences | Quick start → The Experiment builder → API reference |
| Simulate per-individual genotypes | Genotypes → Sampling & population priors → Cohorts |
| Simulate clonal repertoires | Clonal overview → Lineage trees → Repertoires |
| Build a reference cartridge | Reference cartridge concept → Build a reference cartridge |
| Benchmark tools against ground truth | Quick start → Benchmarking genotype inference |
| Get validated / reproducible output | Validate AIRR records → Trace, replay, reproducibility |
// Citation
If GenAIRR helps your research, please cite: Konstantinovsky T, Peres A, Polak P, Yaari G. An unbiased comparison of immunoglobulin sequence aligners. Briefings in Bioinformatics. 2024;25(6):bbae556. doi:10.1093/bib/bbae556