Recipe A · 03 · ~5 min · beginner

Sample only productive sequences.

Add one keyword argument and every record GenAIRR emits will be a functional receptor — in frame, no stop codons, anchors intact. No reject-and-retry loop; the engine prunes the candidate distribution at sample time so productive sequences fall out by construction.

01 Pass respect= ga.productive() bundle
02 Pick mode permissive · strict
03 Verify rate 99%+ with bundle
PART 01

The one-line change.

The contract bundle ga.productive() combines four predicates: V anchor preserved, J anchor preserved, junction length divisible by three, no stop codons in the junction. Pass it as respect= and the engine refuses to pick any draw that would break those invariants.

Drop the bundle into run()
import GenAIRR as ga

result = (
    ga.Experiment.on("human_igh")
       .recombine()
       .run(n=1000, seed=42,
            respect=ga.productive())
)

# every record is productive by construction
# productive = True · vj_in_frame = True · stop_codon = False
PART 02

Strict vs permissive — pick a failure mode.

For most allele pairs the admissible support is plenty wide. But some configurations are over-constrained: a particular V-J combination may simply have no NP length that keeps the junction in frame. Default is to relax and continue (you get a record, possibly non-productive). Pass strict=True if you'd rather catch the failure loudly.

Permissive (default)
  • strict=Falseif the admissible support is empty, fall back to the unconstrained distribution
  • trace showsa fallback marker on the affected draw — you can audit what happened
  • use whenyou want a steady stream of records and the occasional non-productive sequence is fine
Strict
  • strict=Trueraise StrictSamplingError carrying pass name + trace address + reason
  • recoverycatch the exception, log it, surface the failing constraint to the caller
  • use when100% contract compliance is required — formal benchmarks, validation suites
Catching a strict failure
try:
    ga.Experiment.on("human_igh").recombine().run(
        n=100, seed=42,
        respect=ga.productive(),
        strict=True,
    )
except ga.StrictSamplingError as e:
    pass_name, address, reason = e.args
    # pass_name = "generate_np.np1"
    # address   = "np.np1.length"
    # reason    = "empty_admissible_support"
PART 03

Verify the rate.

Sanity-check a small batch before committing to a large run. Without contracts, natural productive rate ranges from ~30% (no constraint) to ~50% (with implicit anchor checks). With ga.productive() in permissive mode you should see 99%+; in strict mode it's 100% by definition (or an exception).

A 5-line sanity check
result = ga.Experiment.on("human_igh").recombine().run(
    n=1000, seed=42, respect=ga.productive(),
)
rate = sum(1 for o in result
           if o.final_simulation().productive) / len(result)

assert rate >= 0.99, f"productive rate too low: {rate:.3f}"
PART 04

When you should NOT use it.

The productive bundle is a tool for sampling, not a default to leave on. Some studies need the full distribution of natural outcomes — productive and not — and turning the bundle on silently distorts your input.

Training an aligner / annotation model

Use it. You want clean, biology-shaped data so the model learns the signal, not the artifact distribution of broken receptors.

Benchmarking a productivity classifier

Drop it. You want both productive and non-productive sequences so the classifier sees the full discrimination problem.

Studying natural repertoire composition

Drop it. The natural productive : non-productive ratio is itself a biological signal you may want to measure or perturb.

Related recipes

Where to next.

Concept · Contracts →

The full architecture: how the four predicates compose and how the runtime prunes distributions at sample time.

B · 01 · Benchmark an aligner against truth →

The flagship recipe — pair the productive bundle with truth columns and score an aligner.