GenAIRR Demo
Generate annotated AIRR records with ground truth, replayable traces, and clonal families from one fluent Python DSL.
From DSL to AIRR record
the part other simulators usually hide
Experiment.on(...)
.recombine()
.productive_only()
.mutate()
compile()
typed, ordered, signed pass plan
contract set
plan signature
live-call hooks
trace address + value for each draw
events committed IR changes
hooks committed V/D/J live calls
observed
v_call junction_aatruth
truth_v_call productivecounters
n_mutations n_v_mutations1. Annotated AIRR records¶
One fluent chain creates records, truth fields, mutation counters, and a pandas-ready table.
Code
import GenAIRR as ga
result = (
ga.Experiment.on("HUMAN_IGH_OGRDB")
.recombine()
.productive_only()
.mutate(model="s5f", rate=0.03)
.run_records(n=1000, seed=42, expose_provenance=True)
)
df = result.to_dataframe()
Output
What one generated AIRR row contains
observed call, committed truth, and counters travel together
row 0 sequence
CVKDDGNRGYCSGDSCYGHCCALDYWYFDLW
v_call
IGHVF10-G38*04d_call
IGHD2-15*01j_call
IGHJ2*01truth_v_call
IGHVF10-G38*04truth_d_call
IGHD2-15*01truth_j_call
IGHJ2*01productive
Truen_mutations
10columns
1022. Productive by construction¶
productive_only() changes the sampling support before each draw.
Code
unconstrained = (
ga.Experiment.on("HUMAN_IGH_OGRDB")
.recombine()
.run_records(n=1000, seed=42)
)
constrained = (
ga.Experiment.on("HUMAN_IGH_OGRDB")
.recombine()
.productive_only()
.run_records(n=1000, seed=42)
)
unconstrained_productive = sum(
1 for r in unconstrained if r["productive"]
) / len(unconstrained)
constrained_productive = sum(
1 for r in constrained if r["productive"]
) / len(constrained)
print(f"Without productive_only(): {unconstrained_productive:.1%}")
print(f"With productive_only(): {constrained_productive:.1%}")
Output
3. Auditable replay¶
One trace file replays the same record, but refuses a changed pipeline.
Replay is gated, not just seeded
same trace, two outcomes depending on signatures
seed=42
replay_from_trace_file()
Code
exp = (
ga.Experiment.on("HUMAN_IGH_OGRDB")
.recombine().productive_only()
.mutate(model="s5f", rate=0.03)
)
compiled = exp.compile()
outcome = compiled.simulator.run(seed=42)
trace_file = compiled.simulator.trace_file_from(outcome, seed=42)
trace_file.write_to("demo.trace.json")
from GenAIRR._engine import TraceFile
tf = TraceFile.read_from("demo.trace.json")
replayed = compiled.simulator.replay_from_trace_file(tf, strict=False)
Output
Changed code
exp_modified = (
ga.Experiment.on("HUMAN_IGH_OGRDB")
.recombine().productive_only()
.mutate(model="s5f", rate=0.05)
)
compiled_modified = exp_modified.compile()
compiled_modified.simulator.replay_from_trace_file(tf, strict=False)
Output
4. Legacy fixed-size clonal families¶
One parent recombination can fork into many independently mutated
descendants. This demo uses legacy expand_clones for the simple
fixed-size star shape; for new clone benchmarks, see
clonal_lineage for BCR trees and
clonal_repertoire for TCR / abundance
repertoires.
Code
result = (
ga.Experiment.on("HUMAN_IGH_OGRDB")
.recombine().productive_only()
.expand_clones(n_clones=10, per_clone=50)
.mutate(model="s5f", rate=0.02)
.run_records(seed=42, expose_provenance=True)
)
Output
What the clonal result looks like
10 parent outcomes, 500 descendant AIRR records
result.parents[0] holds the pre-fork IR + trace.
clone0_desc04 SHM
clone0_desc114 SHM
clone0_desc28 SHM
clone0_desc37 SHM
clone0_desc45 SHM
+45 more descendants
Every row:
clone_id + parent_id
Within clone: shared V/D/J truth
After fork: independent SHM