Skip to content

// Synthetic immune repertoires with absolute ground truth

Simulate immune receptor sequences. Know exactly what the engine did.

GenAIRR is a high-performance simulator for adaptive immune receptor repertoires (BCR + TCR). A Rust simulation kernel drives a fluent Python DSL; every record comes back annotated with by-construction truth (V/D/J calls, junction, productive flag, identity, mutation counts) derived from the persistent intermediate representation, not inferred by an aligner.

What it's for. A DSL for AIRR. An engine that knows the answer.

// 01

Benchmark

Score alignment, clustering, annotation, or genotype-inference tools (TIgGER, IgDiscover) against a ground truth the engine emitted by construction, not an oracle aligner that can be wrong.

// 02

Null models

Generate biologically grounded null repertoires with controllable productive fraction, SHM rate, V-gene usage, and clonal structure for statistical hypothesis testing.

// 03

Phenomena lab

Switch any biological mechanism on or off (P/N additions, D inversion, receptor revision, targeted SHM) and observe the downstream effect on what aligners report.

// Engine Rust kernel · 23 species, 106 bundled configs · Constraint-aware sampling · Byte-stable replay

What GenAIRR does. Seven capabilities. One composable engine.

// V(D)J · 01

Recombine

Sample alleles, trim, fill NP1/NP2, assemble. Authored typed empirical models per cartridge plane.

// Diploid · 02

Genotype

Per-individual diploid germline: phased V(D)J from one chromosome, zygosity, gene deletion, novel alleles, and multi-subject cohorts.

// SHM · 03

Mutate

Uniform or S5F context-aware SHM. Per-segment + per-V-subregion rate targeting (FWR/CDR).

// Lineage · 04

Simulate clones

BCR lineage trees, TCR clone-size repertoires, and flat abundance benchmarks with planted clone IDs.

// Library · 05

Corrupt

Primer trimming, structural indels, PCR errors, N-base injection. Each knob tunable per workload.

// Contracts · 06

Constrain

Productive-only sampling at compile time. The engine never proposes out-of-frame or stop-bearing candidates.

// Replay · 07

Replay

Trace every random draw. Byte-identical replay across runs, platforms, and one-knob-changed counterfactuals.

import GenAIRR as ga

# Grow real BCR clonal lineage trees - affinity maturation, with ground truth
result = (
    ga.Experiment.on("human_igh")
      .recombine()
      .clonal_lineage(n_clones=50, max_generations=6, n_sample=30,
                      rate=0.01, selection_strength=10.0)
      .sequencing_errors(rate=0.001)
      .run_records(seed=42)
)

result.to_tsv("repertoire.tsv")                 # per-cell AIRR records (clone_id, lineage_*)
newick = result.lineage_trees[0].to_newick()    # ground-truth lineage tree per clone

Install. One command. No compiler.

pip install GenAIRR Pre-built wheels for Linux (x86_64, aarch64), macOS (Intel + Apple Silicon), and Windows (x64). Python 3.9+. No Rust toolchain needed.

See the full quick start → Start the getting-started track →


Choose your path

A few of the most common starting points - the Choose your path page has the full set (eight paths), each expanded into a short ordered reading list.

If you want to ... Start here
Simulate sequences Quick startThe Experiment builderAPI reference
Simulate per-individual genotypes GenotypesSampling & population priorsCohorts
Simulate clonal repertoires Clonal overviewLineage treesRepertoires
Build a reference cartridge Reference cartridge conceptBuild a reference cartridge
Benchmark tools against ground truth Quick startBenchmarking genotype inference
Get validated / reproducible output Validate AIRR recordsTrace, replay, reproducibility

// Citation

If GenAIRR helps your research, please cite: Konstantinovsky T, Peres A, Polak P, Yaari G. An unbiased comparison of immunoglobulin sequence aligners. Briefings in Bioinformatics. 2024;25(6):bbae556. doi:10.1093/bib/bbae556