A 90-second CI loop: generate 1k sequences, diff their key marginals against a saved baseline, fail loudly if a contract regresses or a distribution shifts beyond tolerance. Drops into a pytest fixture so every PR that touches the config gets its biology rechecked before merge.
A smoke test is not a science experiment. It's a guard against subtle regressions — someone edits a config, you want to know the engine still produces in-frame sequences with roughly the right V-usage shape. Tolerances are loose; the goal is to catch obvious breakage in under two minutes.
ga.productive() is a bugOne file in your test suite. Same seed every run — outputs are bit-identical, so any change in the marginals comes from a code change, not RNG drift.
import json, pytest
import GenAIRR as ga
from pathlib import Path
BASELINE = json.loads(Path("tests/baseline.json").read_text())
@pytest.fixture(scope="module")
def panel():
return (
ga.Experiment.on("human_igh")
.recombine()
.mutate(model="s5f", count=(0, 25))
.run_records(
n=1000,
seed=42,
respect=ga.productive(),
expose_provenance=True,
).to_dataframe()
)
def test_no_contract_violations(panel):
assert panel["productive"].all()
def test_v_usage_stable(panel):
top5 = set(panel["truth_v_call"]
.str.split("*").str[0]
.value_counts().head(5).index)
assert top5 == set(BASELINE["v_top5"])
def test_cdr3_length_stable(panel):
mean = panel["cdr3_aa"].str.len().mean()
assert abs(mean - BASELINE["cdr3_mean"]) < 1.0
First-time setup: run the panel once on a known-good build, save the summary marginals, commit them. The test then asserts every subsequent run reproduces them.
import json, GenAIRR as ga
from pathlib import Path
df = (
ga.Experiment.on("human_igh")
.recombine()
.mutate(model="s5f", count=(0, 25))
.run_records(n=1000, seed=42,
respect=ga.productive(),
expose_provenance=True)
.to_dataframe()
)
baseline = {
"v_top5": list(df["truth_v_call"].str.split("*").str[0]
.value_counts().head(5).index),
"cdr3_mean": float(df["cdr3_aa"].str.len().mean()),
}
Path("tests/baseline.json").write_text(json.dumps(baseline, indent=2))
A failure means one of three things — and the first thing to do is figure out which.
Run capture_baseline.py again, commit the new baseline.json
alongside the config change. Test passes on the next run.
Same seed should produce same output. If the test fails on a change that shouldn't
have touched sampling, you've probably altered the order of rng.draw() calls.
Bisect to find the culprit.
If test_no_contract_violations trips, that's a real biology bug — a
non-productive record snuck past ga.productive(). Investigate immediately
before merging anything else.