Field manual · 14 recipes

Guides.

Short, task-oriented recipes. Each one starts with an outcome and ends with a runnable snippet — no theory, no detours. If you already know which knob you're looking for, this is the section to skim.

Filter
A

Build.

Construct a simulation that produces the sequences you actually need.

A·01

Build a config for your locus or species.

Point the engine at a custom germline reference and a custom anchor map. Works for any V(D)J locus — human, mouse, zebrafish, or your own annotation.

~15 min · ●●○ intermediate configgermline
A·02

Compose a custom Pass.

Implement the Pass trait, emit your events into the trace, and slot it anywhere in the pipeline. Includes the boilerplate and three worked examples.

~20 min · ●●● advanced pipelinepython
A·03

Sample only productive sequences.

Two modes: filter (retry until productive) and verify (let unproductive draws through, mark them). When to use each, with timing data.

~5 min · ●○○ beginner contractsproductive
A·04

Wire a custom SHM model.

Replace the built-in S5F with your own substitution distribution — per-base, per-motif, or fully position-aware. Plug it in via the MutationModel port.

~25 min · ●●● advanced mutationmodel
B

Validate.

Use ground truth the way it was meant to be used — to score things and find drift.

B·01

Benchmark an aligner against truth.

The flagship recipe. Simulate 50k sequences, run IgBLAST / MiXCR / your tool, compute v_call accuracy by mutation level, plot the curve.

~30 min · ●●○ intermediate benchmarkaligner
B·02

Compare two SHM models.

S5F vs uniform vs your custom — same recombination seed, only the mutation pass varies. Builds a side-by-side density plot of v_identity.

~15 min · ●●○ intermediate mutationanalysis
B·03

Audit a simulation for biological realism.

Marginal checks: V-usage distribution, CDR3 length histogram, junction codon usage, productive-rate. Compares your sim to an OAS reference panel.

~20 min · ●●○ intermediate validationqc
B·04

Smoke-test a config change.

Two-minute sanity loop: generate 1k sequences, diff key marginals against the previous run, fail loudly if a contract regresses. Drops into CI as a pytest fixture.

~10 min · ●○○ beginner citesting
C

Customize.

Shape the simulation toward your specific sequencer, panel, or empirical reference.

C·01

Tune corruption rates to match your sequencer.

Map an empirical Q-score profile to NCorruptionPass + PCRErrorPass rates. Covers MiSeq, NovaSeq, PacBio HiFi presets.

~20 min · ●●○ intermediate corruptionplatform
C·02

Match an empirical V-gene usage distribution.

Reweight the allele sampler so your output matches a target frequency table — your panel, an OAS subset, or any AIRR TSV.

~15 min · ●●○ intermediate samplingpriors
C·03

Replay a simulation with one knob changed.

Pin a seed, fork the IR at any phase, swap a single pass, re-run from that point. Built on the persistent-IR layers — no full re-execution needed.

~10 min · ●●● advanced irdeterminism
D

Operate.

Run it at scale, get it out of GenAIRR, and into whatever comes next.

D·01

Export to AIRR TSV, Parquet, or FASTA.

Each writer in one line. When to pick which. How to keep the truth_* fields versus stripping them for a blinded benchmark.

~5 min · ●○○ beginner ioairr
D·02

Stream millions of sequences without OOM.

Use the iterator API. Chunked writes. Memory profile of a 10M-sequence run. Worked timing on 8 / 16 / 32 cores.

~15 min · ●●○ intermediate performancestreaming
D·03

Reproduce a published dataset.

Walk-through of recreating the figure-2 dataset from the GenAIRR paper — every seed, every config, every plot. Bring your own GPU not required.

~45 min · ●●○ intermediate reproducibilitypaper
Can't find your recipe?

Tell us what you tried to do.

Every guide here started as a question on an issue thread. If you tried to accomplish something with GenAIRR and the docs didn't get you there, open a discussion — most turn into a recipe within a week.