The ground-truth payoff
Same molecule. Two annotation paths. One guesses. One knows. The whole reason GenAIRR exists is on the right.
The intuition
When you benchmark an aligner against real sequencing data, you're benchmarking it against another aligner — the one that produced your "ground truth." That's a circular reference. GenAIRR breaks the loop: simulate, corrupt, then ask any aligner what it thinks. The truth is right there, by construction.
Side by side
Below: the same simulated heavy-chain sequence (post-corruption: 5' loss = 12, indels, N-bases). On the left, what an off-the-shelf aligner reports. On the right, what GenAIRR knows for certain.
v_call IGHV3-23*01 (or *04?) d_call IGHD3-10 / IGHD2-21 — ambiguous j_call IGHJ4*02 v_sequence_start ~0 (lost leader, can't tell) v_sequence_end 38 np1_length unknown — N's hide it d_sequence_start ~43 junction_length 39 n_mutations ~5 (vs N's, vs indels — flagged) productive unsure (frame uncertain)
truth_v_call IGHV3-23*04 truth_d_call IGHD3-10*01 truth_j_call IGHJ4*02 v_sequence_start 0 v_sequence_end 38 np1_length 5 d_sequence_start 43 junction_length 39 n_mutations 7 (exact — N's distinguished) productive True
Two call fields, by design
Look closely at the record. There are actually two V-call columns: v_call is what GenAIRR's own evidence-based caller would report from the corrupted sequence (and may carry a comma-separated tie-set when the surviving bases don't distinguish close paralogs). truth_v_call is the allele that was sampled at recombination — exact, unambiguous, captured before any corruption could touch it. The same split exists for D and J. When you benchmark an external aligner, you compare it against truth_v_call: that's the only field that's guaranteed correct regardless of how heavy the corruption is.
Why this matters for benchmarking
Run any aligner on the GenAIRR sequence. Compare its output to the truth columns. The diff is the error rate — there's no other reference frame to argue about. This makes GenAIRR the right substrate for:
- Aligner accuracy benchmarks — V/D/J call concordance, junction localization, mutation recall.
- Training data for ML — every record carries 47 labels with no annotation noise.
- Method development — design new corruption scenarios and stress-test how each tool degrades.
What you've built
You now have intuition for the five phases, the S5F model, the corruption pipeline, and what makes GenAIRR's metadata trustworthy. The rest of the docs are recipes — go pick one.
import GenAIRR as ga
experiment = (
ga.Experiment.on("human_igh")
.recombine()
.mutate(count=(5, 25))
.corrupt_5prime_loss(length=(0, 20))
.corrupt_indels(count=(0, 2))
)
# expose_provenance=True adds truth_v/d/j_call columns to every record
result = experiment.run_records(n=10_000, seed=42, expose_provenance=True)
result.to_fasta("corrupted.fasta") # feed to aligner X
truth = result.to_dataframe() # keep truth
# run aligner X → predictions.tsv → join on sequence_id
v_recall = (truth.truth_v_call == predictions.v_call).mean()