API reference¶

A map of the public surface. The reference is organised by workflow - how you actually compose a simulation - rather than alphabetically. For each symbol, the user guide that explains how to use it is linked; the autogenerated per-symbol signatures live on the dedicated pages in the API Reference nav section (Experiment, SimulationResult, ReferenceCartridgeBuilder, reference models).

How to read this reference¶

The page below is workflow-first, not signature-first. Each section starts with the user-facing surface - what to call, in what order - and links to the dedicated guide for context. Use this page to find the right symbol; use the guides to learn when and why to use it.

If you want the precise method signatures or the underlying docstrings, the API Reference nav section has mkdocstrings- generated pages for Experiment, SimulationResult, ReferenceCartridgeBuilder, and the reference models and rules. The generated SimulationResult section below shows what those auto-rendered pages look like.

Top-level imports¶

35 names ship in GenAIRR.__all__. Grouped by what they do:

Simulation entry points¶

Symbol	Purpose	Guide
`Experiment`	Fluent pipeline builder	The Experiment builder
`CompiledExperiment`	Compiled plan (compile once, run many)	The Experiment builder
`SimulationResult`	Records + outcomes + parents wrapper	Quick start
`set_seed`, `get_seed`, `reset_seed`	Thread-local seed management	Validation hub
`__version__`	Resolved via `importlib.metadata`; falls back to `"0.0.0"` for editable installs	-

Reference data - cartridges and bridges¶

Symbol	Purpose	Guide
`DataConfig`	The Python-side cartridge dataclass	Reference cartridge
`RefDataConfig`	The engine-side refdata bridge object	Reference cartridge
`dataconfig_to_refdata`	Bridge function - `DataConfig → RefDataConfig`	`validate_records`
`ConfigInfo`	Identity metadata	Reference cartridge
`ChainType`	`BCR_HEAVY` / `BCR_LIGHT_KAPPA` / ...	-
`Species`	`HUMAN` / `MOUSE` / ...	-
`Productivity`	Productive / NonProductive enumeration	-
`list_configs`	Enumerate the 100+ bundled cartridges	-
`DataConfigError`	DataConfig validation failure	-

Bundled cartridges (lazy-loaded)¶

import GenAIRR as ga

cfg = ga.HUMAN_IGH_OGRDB     # pickled cartridge - loaded on first access

Symbol	Locus / chain
`HUMAN_IGH_OGRDB`	Human IGH, OGRDB
`HUMAN_IGH_EXTENDED`	Human IGH, extended catalogue
`HUMAN_IGK_OGRDB`	Human IGK, OGRDB
`HUMAN_IGL_OGRDB`	Human IGL, OGRDB
`HUMAN_TCRB_IMGT`	Human TRB, IMGT

The five top-level bundled cartridges above are the most commonly used. The full 100+-cartridge catalogue stays reachable via ga.list_configs() and GenAIRR.data.<NAME> (lazy attribute access).

Reference cartridge authoring¶

Symbol	Purpose	Guide
`ReferenceCartridgeBuilder`	Fluent builder for custom cartridges from FASTA	Build a reference cartridge
`CartridgeBuildReport`	Auditable build trail dataclass	Build a reference cartridge
`ReferenceRulesSpec`	Anchor + alphabet + severity rules	Reference cartridge
`AnchorRuleSpec`	Per-anchor expected-AA + required flag	Reference cartridge
`ReferenceEmpiricalModels`	Typed empirical-model bundle	Reference cartridge
`EmpiricalDistributionSpec`	`[(value, weight), ...]` shape	Reference cartridge
`NpBaseModelSpec`	NP-base sampling model (uniform / empirical / Markov)	Junction N/P additions
`AlleleUsageSpec`	Per-segment allele-usage weights	Reference cartridge

Validation reports and exceptions¶

Symbol	Purpose	Guide
`ValidationReport`	Per-record AIRR-validator output	`validate_records`
`FamilyValidationReport`	Family-validator output	Clonal simulation overview
`RecordValidationFailedError`	Raised when strict record validation fails	`validate_records`
`FamilyValidationFailedError`	Raised when strict family validation fails	Clonal simulation overview
`StrictSamplingError`	Raised under `strict=True` on empty admissible support	Validation hub
`productive`	Convenience contract bundle for productive sampling	Recombination biology

Experiment¶

The fluent pipeline builder. Every chained method returns the same Experiment extended by one more pass; the pipeline runs when .run_records(...) / .run(...) / .compile().run(...) is called.

Surface	Methods	Notes
Bind to a cartridge	`Experiment.on(cfg_or_name)`	Accepts a string shortcut, a `DataConfig`, or a `RefDataConfig`
Recombination	`.recombine()`, `.invert_d(prob=...)`, `.receptor_revision(prob=...)`	Ancestor-phase mechanisms
Constraints	`.productive_only()`, `.restrict_alleles(v=..., d=..., j=...)`	Constrain the sample space
Biological mutation	`.mutate(model="s5f"\\|"uniform", rate=..., count=..., segment_rates=..., v_subregion_rates=...)`	The only pass that increments `n_mutations`
Clonal structure	`.clonal_lineage(...)`, `.clonal_repertoire(...)`, legacy `.expand_clones(...)`	BCR trees, TCR / flat-BCR abundance repertoires, or fixed-size star families
Trims override	`.trim(v_3=..., d_5=..., d_3=..., j_5=..., enabled=...)`	Per-experiment trim distribution overrides
Library / sequencer artefacts	`.pcr_amplify(...)`, `.polymerase_indels(...)`, `.ambiguous_base_calls(...)`, `.sequencing_errors(...)`, `.end_loss_5prime(...)`, `.end_loss_3prime(...)`	All descendant-phase
Read layout	`.paired_end(r1_length=..., r2_length=..., insert_size=...)`, `.random_strand_orientation(prob=...)`	Per-read projection
Bookkeeping	`.with_metadata(...)`, `.contaminate(prob=...)`	Stamp metadata; inject background contaminants
Run	`.run_records(n=..., seed=..., validate_records=..., strict=..., expose_provenance=...)`, `.run(...)`, `.stream(...)`, `.stream_records(...)`	Compile + run + project
Compile reuse	`.compile()`	Compile once, reuse across many batches
Compile mode	`.allow_curatable_refdata()`, `.curate_refdata(policy)`	Cartridge-validation mode at compile time

See The Experiment builder for the full pipeline-stage map and ordering rules.

SimulationResult¶

The output of run_records(...) - a list-like wrapper around AIRR record dicts plus the underlying Outcome objects.

Surface	Methods / properties	Notes
List-like access	`len(result)`, `result[i]`, `result[a:b]`, `for rec in result:`	One AIRR dict per element
Underlying state	`.records`, `.outcomes`, `.parents`, `.lineage_trees`	`outcomes` is `None` when built from records only; `parents` exists only for legacy `expand_clones`; `lineage_trees` exists on lineage results
Validation	`.validate_records(refdata)`, `.validate_families()`, `.validate_families_with_parents(refdata)`	See Validation hub
Export	`.to_tsv(path, , airr_strict=False)`, `.to_csv(path, , airr_strict=False)`, `.to_fasta(path, , prefix="seq")`, `.to_fastq(path, , quality="illumina", *kw)`, `.to_paired_fastq(r1, r2, , quality="illumina", overwrite=False, *kw)`, `.to_dataframe(, airr_strict=False)`	See Export the results
Construction	`SimulationResult.from_outcomes(outcomes, refdata, *, id_prefix="seq", expose_provenance=False)`	Build a result from Rust `Outcome` objects + refdata; `expose_provenance=True` injects `truth_*_call` columns

Reports and exceptions¶

`ValidationReport`¶

report = result.validate_records(refdata)
report.ok        # bool - every record passed
report.count     # int - total records validated
report.failures  # list[dict] - per-record failure entries with structured issues
report.summary() # str - histogram of issue kinds
bool(report)     # == report.ok

`FamilyValidationReport`¶

report = result.validate_families()
report.ok                  # bool
report.count               # total records inspected
report.family_count        # int - distinct clone_ids found
report.members_per_family  # dict[int, int] - descendant count per clone
report.failures            # list[dict] - per-family failure entries
report.summary()           # histogram of issue kinds
bool(report)               # == report.ok

Same dataclass shape is returned by validate_families_with_parents(refdata).

`CartridgeBuildReport`¶

report = builder.report()           # or cfg.build_report
report.stages                       # list[dict] - one entry per builder call
report.warnings                     # list[str] - build-finalisation warnings
report.rejected                     # list[dict] - per-allele drops + per-row estimator rejections
report.manifest_snapshot            # dict | None - cartridge_manifest() at build time
report.checksum_at_build_time       # str | None - schema_sha256 stamped on cfg
report.to_dict()                    # JSON-clean dict for CI artifacts

See Build a reference cartridge for the full surface.

Exceptions¶

Exception	Raised by	When
`StrictSamplingError`	`run_records(..., strict=True)`	Empty admissible support at a sampling site
`RecordValidationFailedError`	Strict record-validation code paths	A validator gate fired in strict mode
`FamilyValidationFailedError`	Strict family-validation code paths	A family-validator gate fired in strict mode
`DataConfigError`	`DataConfig` validation	Cartridge spec rejected

StrictSamplingError is NOT a ValueError subclass - except ValueError will not catch it. Catch it explicitly:

try:
    result = exp.run_records(n=10, seed=42, strict=True)
except ga.StrictSamplingError as e:
    pass_name, address, reason = e.args

Generated `SimulationResult`¶

What an autogenerated section looks like (rendered via mkdocstrings from the docstrings in src/GenAIRR/result.py). The same treatment is on the dedicated pages for Experiment, ReferenceCartridgeBuilder, and the reference-model specs.

`GenAIRR.SimulationResult` ¶

List-like wrapper around a batch of AIRR records.

result[i] returns the i-th record dict; len(result) is the number of records; iteration yields records in order.

The original Outcome objects (with their full trace + revision history) are kept on .outcomes for advanced inspection — most users won't need them.

`records` `property` ¶

The underlying list of record dicts. Mutation through this view propagates back into the result.

`outcomes` `property` ¶

The underlying list of Outcome objects, or None when this :class:SimulationResult was built from records directly (e.g. loaded from a TSV).

`parents` `property` ¶

Per-clone parent Outcome objects for clonal results; None for non-clonal results and for results built from records directly.

parents[c] is the recombination ancestor of clone c: every descendant record with record["clone_id"] == record["parent_id"] == c was produced by running the post-fork plan from this parent's :meth:final_simulation.

The parent Outcome carries the pre-fork addressed-choice .trace(), the pre-fork .events() ledger, the per-revision IR history (.revision(i)), and the final assembled IR (.final_simulation()). Use these for replay, lineage analysis, or building a parent-aware family validator (Slice 3+ scope).

The flat .outcomes list continues to carry only the descendant outcomes (one entry per AIRR record); parents live exclusively here. len(.parents) equals the clonal pipeline's n_clones; len(.outcomes) equals n_clones * per_clone.

`validate_records(refdata)` ¶

Public AIRR output correctness check.

Run the postcondition validator over every record in this result and return a :class:ValidationReport. This is the gate a downstream consumer cares about: "is each projected AIRR record internally consistent with the engine state that produced it?"

Each record is re-derived independently from its original Outcome (trace + event ledger + final Simulation) and compared against the projected dict. A record passes when outcome.validate_record(refdata, sequence_id=...) returns an empty issue list. Failures collect the record_index, sequence_id, and the issue dicts.

Companion check — for engine-side integrity, see :meth:Outcome.check_live_call_cache_parity (returns the cached-vs-fresh divergence on the live-call cache that feeds projection).

Troubleshooting rule — if a CI run has both this validator AND the parity harness failing on the same batch, fix the parity divergence FIRST: a stale cache can leak into projection and produce spurious validator failures. Once parity is green, rerun the validator; remaining failures point at a real projection-layer bug.

refdata must be the same :class:RefDataConfig the outcomes were produced against; passing a different refdata will misreport mismatches against an unrelated allele pool.

Raises RuntimeError when this result was built without attached outcomes (e.g. loaded from a TSV); the validator needs the engine state, not just the projected record.

`validate_families(refdata=None)` ¶

Clonal-family consistency check — a strict subset of the audit's §6 family-layer invariants (docs/clonal_family_design.md).

Groups records by clone_id and asserts the recombination- time truth fields agree across every descendant of a clone. refdata is reserved for forward compatibility with the deeper family-layer checks (pre-SHM junction, mutation- distance distribution) the audit's later slices add; this slice's invariants are all dict-only and ignore refdata.

Currently enforced invariants:

truth_v_call constant within each clone_id, when present. Skipped silently for clones whose records were projected without expose_provenance=True.
truth_d_call same.
truth_j_call same.
clone_id is present on every record once any record in the batch carries one (a batch that mixes clonal and non-clonal records raises CloneIdMissing).

Non-clonal results return ok with family_count == 0. This makes result.validate_families() a safe no-op on a flat batch — the call site does not need to branch on the result's clonal-ness.

Records-only results work. Unlike :meth:validate_records, this validator does not require the underlying Outcome objects — every check is on record-dict fields — so a SimulationResult loaded from TSV can still be family-validated.

Not enforced yet (deferred per the audit's §14 out-of-scope list): mutation-distance distribution, pre-SHM junction invariance, parent-trace reconstruction, lineage topology, original_v_call / d_inverted invariance.

Returns a :class:FamilyValidationReport carrying count, family_count, members_per_family, and failures.

`to_dataframe(*, airr_strict=False)` ¶

Return a :class:pandas.DataFrame with one row per record.

airr_strict=True converts all 0-based half-open coord *_start fields to the AIRR-spec 1-based-inclusive form (*_end fields are unchanged). Useful when handing the DataFrame off to AIRR-strict downstream tooling.

Raises ImportError if pandas isn't installed (pandas is an optional extra: pip install GenAIRR[all]).

`to_tsv(path, *, airr_strict=False)` ¶

Write the records as AIRR-style TSV (tab-separated). The header row uses :data:_DEFAULT_COLUMN_ORDER.

airr_strict=True converts coord *_start fields to 1-based-inclusive (AIRR spec).

What's not documented here yet¶

A few surfaces deliberately don't appear on this page:

Deep Rust engine internals. GenAIRR._engine exposes the PyO3 bindings (Outcome, TraceFile, RefDataConfig, Simulation, etc.). These are reachable from Python but are engine-developer surfaces; the user-facing surface above (SimulationResult.outcomes[i].trace(), compiled.simulator.replay_from_trace_file(...)) is what user code should touch.
Experimental / private helpers. Anything under GenAIRR._* is private; signatures and behaviour can change between releases without notice.
Pre-engine documentation. The historical _old_docs/ directory in the repository carries an earlier docs system that predates the Rust engine; it's preserved for reference but not migrated into the current site.

For the full per-symbol docstring inventory, the API Reference nav section carries mkdocstrings-driven pages for Experiment, SimulationResult, ReferenceCartridgeBuilder, and the cartridge-spec dataclasses.

API reference¶

How to read this reference¶

Top-level imports¶

Simulation entry points¶

Reference data - cartridges and bridges¶

Bundled cartridges (lazy-loaded)¶

Reference cartridge authoring¶

Validation reports and exceptions¶

Experiment¶

SimulationResult¶

Reports and exceptions¶

ValidationReport¶

FamilyValidationReport¶

CartridgeBuildReport¶

Exceptions¶

Generated SimulationResult¶

GenAIRR.SimulationResult ¶

records property ¶

outcomes property ¶

parents property ¶

validate_records(refdata) ¶

validate_families(refdata=None) ¶

to_dataframe(*, airr_strict=False) ¶

to_tsv(path, *, airr_strict=False) ¶