Choose your path¶
Eight paths through the docs, organised by what you came here to do. Each path is a short curriculum - read in order and you'll have a working mental model by the end. Skip to any starting point that matches your current task.
I want to simulate sequences¶
The fastest path from pip install to a productive heavy-chain
batch.
- Quick start - 5 minutes, 100 productive heavy chains, AIRR records returned.
- Your first AIRR record field-by-field tour of what the engine emits.
- The Experiment builder - the full DSL surface: recombination, mutation, constraints, clonal expansion, corruption, paired-end, compile reuse.
- Export the results - TSV / CSV / FASTA / FASTQ / DataFrame.
If you only ever read one page after the quick start, make it the Experiment builder.
I want to simulate per-individual genotypes¶
Phased, diploid germline per subject - the basis for genotype/ haplotype-inference benchmarks.
- Genotypes (overview) - what a genotype is, how phased recombination samples from it, and how to build one (zygosity, deletion, duplication, novel alleles).
- Sampling & population priors -
draw a genotype with
Genotype.sample, or author/estimate a donor-population prior on the cartridge. - Genotype cohorts - simulate
many subjects in one
run_cohortcall, each with its own genotype and per-subject provenance.
With no genotype attached the engine is byte-for-byte unchanged; attaching one is the only thing that switches recombination onto the phased path.
I want to simulate clonal repertoires¶
Shared-ancestor structure: BCR affinity-maturation trees and TCR / abundance repertoires with planted clone IDs.
- Clonal simulation overview -
choose
clonal_lineagefor BCR trees,clonal_repertoirefor TCR / abundance repertoires, or legacyexpand_clones. - Clonal lineage trees - BCR SHM trees, selection, final-cell sampling, lineage metadata, and tree exports (Newick).
- Clonal repertoires - TCR and
flat-BCR clone sizes,
duplicate_count, and AIRR clone-caller export.
I want to benchmark tools against ground truth¶
GenAIRR's core use case: simulate a known answer, run a tool, score it - with no real-data uncertainty about what's correct.
- Quick start - produce a
repertoire with by-construction truth columns
(
expose_provenance=True). - Validate AIRR records - confirm every reported field is internally consistent before you score anything.
- Benchmarking genotype inference - the end-to-end recipe with a worked TIgGER / IgDiscover example recovering a planted genotype; the same pattern applies to aligner / annotation benchmarks.
I want to build a reference cartridge¶
Custom alleles, custom empirical distributions, custom biology.
- Reference cartridge concept the four-plane model (identity, catalogue, rules, empirical models). Read this before any builder work.
- Build a reference cartridge
the practical builder workflow from FASTA to
build(). - Estimate models from data turn an AIRR-like rearrangement table into empirical models on the cartridge.
- Inspect manifest + build report audit what's in a cartridge before you ship it or pin it in CI.
The bundled cartridges (HUMAN_IGH_OGRDB, etc.) work for most
projects; reach for the builder when you have a non-standard
catalogue or want estimated biology to match a specific dataset.
I want reproducible / validated output¶
Bit-stable runs, replayable traces, output you can defend.
- Validation hub - the overall reproducibility + validation story.
validate_records- the per-record AIRR-output correctness gate.- Trace, replay, reproducibility - what a trace contains, how to replay it, every failure mode, strict-mode behaviour.
For deep-publication reproducibility, the canonical pattern is: seed for fast runs, trace for the records you'll defend.
I want to understand biology mechanisms¶
The engine's biological surface - what's modelled, what's not, where the v1 boundary sits.
- Recombination + junction biology V(D)J join, trim-and-fill, productivity contract.
- Junction N/P additions - N-base composition models, P-nucleotide lengths, layout diagrams.
- D inversion + receptor revision the two recombination-stage editing mechanisms.
- SHM and mutation targeting - uniform vs S5F, per-segment and per-V-subregion rates, counter partitions.
- Corruption + sequencing artefacts the observation-stage mechanisms (PCR, sequencing errors, indels, end-loss, N corruption, strand).
Each guide opens with the biology, names the v1 boundary decisions, and links back to the audit doc that defined them. Clonal structure has its own path above (simulate clonal repertoires).
I'm contributing to GenAIRR¶
The contributor doorway - engine architecture, audit-first workflow, mechanism additions.
- Architecture (Contributor) - the engine mental model, the audit-first workflow, the "before you add a new mechanism" checklist, deep links into the audit corpus.
audit-docs/engine_architecture.mdthe seven engine invariants. Required reading before any kernel work.audit-docs/adding_a_pass.mdthe pass-author playbook with the minimal pass template and the three required test patterns.audit-docs/validation_matrix.mdthe navigable map: every guarantee โ audit doc โ test file โ kernel invariant.
GenAIRR's release process is audit-first: every mechanism gets specified, pinned, and validated before implementation. The architecture landing page explains how that discipline shapes contribution.
The above paths cover ~95 % of real reader intents. If yours isn't here, the API Reference is the authoritative public-surface catalogue.