Short, task-oriented recipes. Each one starts with an outcome and ends with a runnable snippet — no theory, no detours. If you already know which knob you're looking for, this is the section to skim.
Construct a simulation that produces the sequences you actually need.
Point the engine at a custom germline reference and a custom anchor map. Works for any V(D)J locus — human, mouse, zebrafish, or your own annotation.
Implement the Pass trait, emit your events into the trace, and
slot it anywhere in the pipeline. Includes the boilerplate and three worked examples.
Two modes: filter (retry until productive) and verify (let unproductive draws through, mark them). When to use each, with timing data.
Replace the built-in S5F with your own substitution distribution — per-base,
per-motif, or fully position-aware. Plug it in via the MutationModel port.
Use ground truth the way it was meant to be used — to score things and find drift.
The flagship recipe. Simulate 50k sequences, run IgBLAST / MiXCR / your tool,
compute v_call accuracy by mutation level, plot the curve.
S5F vs uniform vs your custom — same recombination seed, only the mutation
pass varies. Builds a side-by-side density plot of v_identity.
Marginal checks: V-usage distribution, CDR3 length histogram, junction codon usage, productive-rate. Compares your sim to an OAS reference panel.
Two-minute sanity loop: generate 1k sequences, diff key marginals against the previous run, fail loudly if a contract regresses. Drops into CI as a pytest fixture.
Shape the simulation toward your specific sequencer, panel, or empirical reference.
Map an empirical Q-score profile to NCorruptionPass +
PCRErrorPass rates. Covers MiSeq, NovaSeq, PacBio HiFi presets.
Reweight the allele sampler so your output matches a target frequency table — your panel, an OAS subset, or any AIRR TSV.
Pin a seed, fork the IR at any phase, swap a single pass, re-run from that point. Built on the persistent-IR layers — no full re-execution needed.
Run it at scale, get it out of GenAIRR, and into whatever comes next.
Each writer in one line. When to pick which. How to keep the
truth_* fields versus stripping them for a blinded benchmark.
Use the iterator API. Chunked writes. Memory profile of a 10M-sequence run. Worked timing on 8 / 16 / 32 cores.
Walk-through of recreating the figure-2 dataset from the GenAIRR paper — every seed, every config, every plot. Bring your own GPU not required.
Every guide here started as a question on an issue thread. If you tried to accomplish something with GenAIRR and the docs didn't get you there, open a discussion — most turn into a recipe within a week.