Validation & reproducibility¶
GenAIRR's outputs are rich — 50+ AIRR fields per record, full event ledgers per outcome, recoverable traces per seed. Validation answers one question across all of it: does what the record reports agree with what the engine actually did? This page is the hub for the three user-facing validation layers, the replay surface that makes runs durable, and the recommended workflows for development and CI.
Your learning path
You're at the hub of the "I want reproducible / validated
output" path. Two focused deep dives plug in here:
validate_records is the per-record
output-correctness gate, and
Trace, replay, reproducibility
is the durable-replay surface. Start with this page for
orientation, then pick the deep dive that matches your task.
See all paths →
Why validation exists¶
The engine drives every AIRR field from internal state — the persistent IR, the event ledger, and the per-draw trace. The projection that maps that state into the record runs outside the engine, so a bug in the projection (or in any code path that feeds the projection — live-call caching, V-subregion attribution, counter aggregation) could silently produce a record whose fields don't match the outcome that produced them. Validation closes that loop: independently re-derive every reported field from the upstream source, and surface any divergence as a structured report.
If you ever ask "does this record's v_call match the allele the
engine actually sampled?", "do n_v + n_d + n_j + n_np add up to
n_mutations?", "is productive consistent with the actual
junction translation?" — validation is the call that answers,
without you writing the cross-check yourself.
The three validation layers¶
GenAIRR exposes three user-facing validation surfaces. All of them are opt-in — production loops that have qualified their pipeline don't pay the per-record overhead by default.
Record validation¶
The point-wise validator. Re-derives every field on every record
from the upstream source and reports per-record divergences in a
ValidationReport. This is the gate every release-tier CI run
should carry.
→ See the dedicated guide for the issue
catalogue, the ValidationReport API, and the canonical CI
one-liner.
Family validation¶
For workloads that stamp clone_id (clonal_lineage(...),
clonal_repertoire(...), or legacy expand_clones(...)), family
validation checks clone-level record consistency:
family_report = result.validate_families()
assert family_report, family_report.summary()
# Stronger gate — runs per-record validation on every parent first.
full_report = result.validate_families_with_parents(refdata)
assert full_report, full_report.summary()
validate_families is records-only. It checks that a clonal batch is
not mixed with non-clonal records and, when truth_v_call,
truth_d_call, and truth_j_call are present, that those
recombination-time truth calls are invariant within each clone_id
group. No refdata required.
validate_families_with_parents(refdata) is the stronger
legacy-star diagnostic for expand_clones(...) outputs, where
result.parents exists. It compares descendant records against the
actual parent Outcome. Modern clonal_repertoire and
clonal_lineage do not expose result.parents; validate their
records with validate_records, use validate_families for
clone_id grouping, and validate lineage trees with tree.validate().
Runtime opt-in¶
For workloads where you want the validator to run on every batch without an explicit second call:
The validator runs inline after each batch; the result is the same
as calling result.validate_records(refdata) explicitly. Off by
default — opt in during development; leave off in production
hot loops.
What each layer catches¶
| Issue category | Caught by |
|---|---|
Sequence + coordinate consistency (sequence matches pool; *_start/end well-ordered) |
validate_records |
| CIGAR ops (canonical M/I/D/S/N/P/X/=) + query span | validate_records |
Counter provenance (n_mutations, per-segment partition, n_pcr_errors, indel counts) |
validate_records |
Allele calls (v_call / d_call / j_call matches an independent walker) |
validate_records |
Junction + productive triad (junction content, vj_in_frame, stop_codon, productive predicate identification) |
validate_records |
Paired-end geometry (R1/R2 windows, R2 reverse-complement, insert_size) |
validate_records (when paired_end() is in the pipeline) |
Clonal batch is consistently stamped with clone_id; truth V/D/J calls are invariant within clone when truth columns are present |
validate_families |
Legacy expand_clones descendants agree with their parent Outcome |
validate_families_with_parents |
The three layers are independent — validate_records doesn't
require a clonal structure; validate_families doesn't require
a refdata. They compose for the strongest possible gate:
report = result.validate_records(refdata)
assert report, report.summary()
families = result.validate_families_with_parents(refdata)
assert families, families.summary()
Trace and replay¶
Full guide
See Trace, replay, and reproducibility for the deep dive: seed vs trace, replay vs rerun, every failure mode, strict-mode interactions, and recommended workflows for debugging, regression tests, and reproducible examples. The summary below covers the essentials.
GenAIRR's reproducibility model rests on two facts:
- Same seed + same plan + same cartridge → byte-identical output.
Across runs, machines, and platforms. The
seed=argument onrun_records(...)is the canonical surface;nrecords use seeds[seed, seed+1, …, seed+n-1], so batches stitch together if you offset the starting seed. - Every random draw is recorded on the outcome's trace. Each
draw lives at a stable hierarchical address (e.g.
"sample_allele.v","np.np1.length","np.np1.bases[3]"). You can inspect the trace, dump it to disk, and replay it later.
For most users, the seed argument is the whole reproducibility story. The trace+replay surface matters when:
- You want a durable replay artifact that captures the cartridge identity, engine version, and DSL signature — not just the seed.
- You want to verify, weeks later, that a recorded run still reproduces against the current code + cartridge.
- You're filing a bug and want a self-contained artifact a maintainer can reproduce against.
Saving a trace file¶
A TraceFile bundles a single outcome's recorded trace together
with the plan signature, refdata signature, refdata content
hash, engine version, and producing seed:
compiled = exp.compile()
outcome = compiled.simulator.run(seed=42)
trace_file = compiled.simulator.trace_file_from(outcome, seed=42)
trace_file.write_to("run-42.trace.json")
Read it back later with:
from GenAIRR._engine import TraceFile
trace_file = TraceFile.read_from("run-42.trace.json")
trace_file.to_json() # round-trip the JSON
trace_file.seed # 42
trace_file.engine_version # "X.Y.Z"
trace_file.schema_version # int
Replaying a trace file¶
Two complementary replay paths on the simulator:
replay_from_trace_file(trace_file)— consumes the recorded values verbatim at every sampling slot. The trace becomes the source of randomness, not the RNG. This is the strongest reproducibility gate: byte-identical output even if the seed's RNG sequence drifts in a future version.rerun_from_trace_file(trace_file)— re-runs the sampler from the trace's recorded seed. The trace acts as a signature bundle (plan + refdata gates) rather than as a value source. Useful when you want a fresh draw against the same configured pipeline.
Mismatch errors¶
Replay is gated on three signatures, each of which fires a
ValueError if it disagrees with the trace:
| Gate | When it fires | Message prefix |
|---|---|---|
| Plan signature | DSL chain changed (different passes / different rates / different kwargs that fold into the signature) | "pass plan signature mismatch" |
| Refdata signature | Cartridge structure changed (different catalogue / rules) | "refdata signature mismatch" |
| Refdata content hash | Cartridge bytes changed (curation, V-subregion annotation, allele content) | "refdata content hash mismatch" (replay only) |
These fail loudly, before any choices are consumed. If your
replay errors with "refdata content hash mismatch", the trace
was produced against a different cartridge (rules / identity /
curation may differ); load the original cartridge to replay.
Strict vs permissive¶
Every run_records (and run, stream, stream_records) takes a
strict=False keyword that controls what happens when a sampler
runs out of admissible candidates at sample time. This is rare,
but it's the canonical failure mode of an unsatisfiable plan
(e.g. a productive constraint that admits no junction under your
NP-length distribution).
# Default — permissive
result = exp.run_records(n=100, seed=0)
# Strict — fail loud on empty admissible support
result = exp.run_records(n=100, seed=0, strict=True)
The two modes diverge only when admissible support is empty:
| Mode | When admissible support is empty | What you see |
|---|---|---|
strict=False (default) |
Falls back to a documented sentinel value — indel site -1, NP length 0, NP base N, trim 0; SHM substitution skips the slot. |
Execution continues; the record may end up non-productive at that site. |
strict=True |
Raises StrictSamplingError immediately. |
The call fails with (pass_name, address, reason) args naming the failing site. |
StrictSamplingError is not a ValueError subclass — except
ValueError will not catch it. Catch it by name from the top-level
import:
import GenAIRR as ga
try:
result = exp.run_records(n=10, seed=42, strict=True)
except ga.StrictSamplingError as e:
pass_name, address, reason = e.args
print(f"{pass_name} couldn't satisfy the contract at {address}: {reason}")
The reason field is a stable lowercase code — common values:
"empty_admissible_support", "support_unavailable",
"missing_allele.V.42", "contract_violation.NoStopCodonInJunction".
The recommended posture: leave strict off in production, where the sentinel fallback keeps your batch flowing; turn strict on during cartridge / DSL development, where you want the unsatisfiable plan to surface immediately rather than silently.
Note that strict mode applies only to fresh sampling — replay
consumes recorded values verbatim, so it never re-evaluates
contract admissibility. To force strict-fresh semantics on a
recorded trace, call simulator.run(seed=<original_seed>,
strict=True) instead of replay_from_trace_file.
Recommended workflows¶
The three layers + the two reproducibility surfaces compose into three workflow patterns covering the common cases.
Development¶
Catch problems immediately. Runtime validation surfaces drift between your DSL changes and the engine's expectations; strict mode surfaces unsatisfiable plans:
If the run succeeds and the inline validation passes, the pipeline is qualified.
Continuous integration¶
Run the validator as an explicit gate, surfacing a structured report you can keep as a build artifact:
result = exp.run_records(n=1000, seed=0)
report = result.validate_records(refdata)
assert report, report.summary()
A failing report dumps the per-record issues to the test runner;
the failures list is JSON-serialisable for downstream tooling.
Clonal output¶
Stack record + family validation when the pipeline ships clonal records:
result = (
exp
.recombine()
.clonal_repertoire(n_clones=50, max_size=100)
.sequencing_errors(rate=0.001)
.run_records(seed=42, expose_provenance=True)
)
assert result.validate_records(refdata), "AIRR record divergence"
assert result.validate_families(), "Family invariant divergence"
For legacy expand_clones(...), add
result.validate_families_with_parents(refdata). For
clonal_lineage(...), also call tree.validate() on each
result.lineage_trees entry when you need topology checks.
When validation is not enough¶
GenAIRR's validation answers is this record internally consistent with how the engine claims it was produced — not is this output biologically realistic. A pipeline can validate clean and still produce simulations that don't match the biology you're targeting. Three places the validators don't help:
- Are my cartridge's empirical distributions right? The
cartridge's
cartridge_manifest()block is the canonical provenance source for cartridge content. The build report attached to aReferenceCartridgeBuilder-produced cartridge captures every estimator's inputs and inferred distributions. - Are my output distributions calibrated to a reference dataset? Use the distribution-invariant test suite or compare your simulated marginals against the dataset you're benchmarking against. The audit-realism workflow pattern is the natural starting point.
- Does my simulation match a specific aligner's expectations? GenAIRR's calls come from an independent walker; an aligner with different scoring rules will produce different calls on the same sequence. That's not a validator problem; it's an aligner-comparison problem.
For input-model provenance specifically:
manifest = cfg.cartridge_manifest()
print(manifest["models"]["allele_usage"]) # what's authored on the cartridge
print(manifest["hashes"]["data_config_checksum"]) # canonical content hash
if cfg.build_report is not None:
for stage in cfg.build_report.stages:
print(stage["stage"], stage["inputs"]) # what every estimator saw
Where to go next¶
validate_records— the full guide to the per-record validator: API, the five issue categories, reading aValidationReport.- Your first AIRR record — the field catalogue the validator checks against.
- Reference cartridge — the cartridge model the validator gates against and the manifest that documents input-model provenance.
- The Experiment builder —
how the pipeline composes and where
validate_records=Trueandstrict=Truesit in the call surface.