Skip to content

API reference

A map of the public surface. The reference is organised by workflow — how you actually compose a simulation — rather than alphabetically. For each symbol, the user guide that explains how to use it is linked; the autogenerated per-symbol signatures are still being expanded.

How to read this reference

The page below is workflow-first, not signature-first. Each section starts with the user-facing surface — what to call, in what order — and links to the dedicated guide for context. Use this page to find the right symbol; use the guides to learn when and why to use it.

If you want the precise method signatures or the underlying docstrings, the generated SimulationResult section below shows what auto-rendered API docs look like (one example class today; the rest land in a follow-up slice).

Top-level imports

35 names ship in GenAIRR.__all__. Grouped by what they do:

Simulation entry points

Symbol Purpose Guide
Experiment Fluent pipeline builder The Experiment builder
CompiledExperiment Compiled plan (compile once, run many) The Experiment builder
SimulationResult Records + outcomes + parents wrapper Quick start
set_seed, get_seed, reset_seed Thread-local seed management Validation hub
__version__ Resolved via importlib.metadata; falls back to "0.0.0" for editable installs

Reference data — cartridges and bridges

Symbol Purpose Guide
DataConfig The Python-side cartridge dataclass Reference cartridge
RefDataConfig The engine-side refdata bridge object Reference cartridge
dataconfig_to_refdata Bridge function — DataConfig → RefDataConfig validate_records
ConfigInfo Identity metadata Reference cartridge
ChainType BCR_HEAVY / BCR_LIGHT_KAPPA / ...
Species HUMAN / MOUSE / ...
Productivity Productive / NonProductive enumeration
list_configs Enumerate the 100+ bundled cartridges
DataConfigError DataConfig validation failure

Bundled cartridges (lazy-loaded)

import GenAIRR as ga

cfg = ga.HUMAN_IGH_OGRDB     # pickled cartridge — loaded on first access
Symbol Locus / chain
HUMAN_IGH_OGRDB Human IGH, OGRDB
HUMAN_IGH_EXTENDED Human IGH, extended catalogue
HUMAN_IGK_OGRDB Human IGK, OGRDB
HUMAN_IGL_OGRDB Human IGL, OGRDB
HUMAN_TCRB_IMGT Human TRB, IMGT

The five top-level bundled cartridges above are the most commonly used. The full 100+-cartridge catalogue stays reachable via ga.list_configs() and GenAIRR.data.<NAME> (lazy attribute access).

Reference cartridge authoring

Symbol Purpose Guide
ReferenceCartridgeBuilder Fluent builder for custom cartridges from FASTA Build a reference cartridge
CartridgeBuildReport Auditable build trail dataclass Build a reference cartridge
ReferenceRulesSpec Anchor + alphabet + severity rules Reference cartridge
AnchorRuleSpec Per-anchor expected-AA + required flag Reference cartridge
ReferenceEmpiricalModels Typed empirical-model bundle Reference cartridge
EmpiricalDistributionSpec [(value, weight), ...] shape Reference cartridge
NpBaseModelSpec NP-base sampling model (uniform / empirical / Markov) Junction N/P additions
AlleleUsageSpec Per-segment allele-usage weights Reference cartridge

Validation reports and exceptions

Symbol Purpose Guide
ValidationReport Per-record AIRR-validator output validate_records
FamilyValidationReport Family-validator output Clonal simulation overview
RecordValidationFailedError Raised when strict record validation fails validate_records
FamilyValidationFailedError Raised when strict family validation fails Clonal simulation overview
StrictSamplingError Raised under strict=True on empty admissible support Validation hub
productive Convenience contract bundle for productive sampling Recombination biology

Experiment

The fluent pipeline builder. Every chained method returns the same Experiment extended by one more pass; the pipeline runs when .run_records(...) / .run(...) / .compile().run(...) is called.

Surface Methods Notes
Bind to a cartridge Experiment.on(cfg_or_name) Accepts a string shortcut, a DataConfig, or a RefDataConfig
Recombination .recombine(), .invert_d(prob=...), .receptor_revision(prob=...) Ancestor-phase mechanisms
Constraints .productive_only(), .restrict_alleles(v=..., d=..., j=...) Constrain the sample space
Biological mutation .mutate(model="s5f"\|"uniform", rate=..., count=..., segment_rates=..., v_subregion_rates=...) The only pass that increments n_mutations
Clonal structure .clonal_lineage(...), .clonal_repertoire(...), legacy .expand_clones(...) BCR trees, TCR / flat-BCR abundance repertoires, or fixed-size star families
Trims override .trim(v_3=..., d_5=..., d_3=..., j_5=..., enabled=...) Per-experiment trim distribution overrides
Library / sequencer artefacts .pcr_amplify(...), .polymerase_indels(...), .ambiguous_base_calls(...), .sequencing_errors(...), .end_loss_5prime(...), .end_loss_3prime(...) All descendant-phase
Read layout .paired_end(r1_length=..., r2_length=..., insert_size=...), .random_strand_orientation(prob=...) Per-read projection
Bookkeeping .with_metadata(...), .contaminate(prob=...) Stamp metadata; inject background contaminants
Run .run_records(n=..., seed=..., validate_records=..., strict=..., expose_provenance=...), .run(...), .stream(...), .stream_records(...) Compile + run + project
Compile reuse .compile() Compile once, reuse across many batches
Compile mode .allow_curatable_refdata(), .curate_refdata(policy) Cartridge-validation mode at compile time

See The Experiment builder for the full pipeline-stage map and ordering rules.

SimulationResult

The output of run_records(...) — a list-like wrapper around AIRR record dicts plus the underlying Outcome objects.

Surface Methods / properties Notes
List-like access len(result), result[i], result[a:b], for rec in result: One AIRR dict per element
Underlying state .records, .outcomes, .parents, .lineage_trees outcomes is None when built from records only; parents exists only for legacy expand_clones; lineage_trees exists on lineage results
Validation .validate_records(refdata), .validate_families(), .validate_families_with_parents(refdata) See Validation hub
Export .to_tsv(path, *, airr_strict=False), .to_csv(path, *, airr_strict=False), .to_fasta(path, *, prefix="seq"), .to_fastq(path, *, quality="illumina", **kw), .to_paired_fastq(r1, r2, *, quality="illumina", overwrite=False, **kw), .to_dataframe(*, airr_strict=False) See Export the results
Construction SimulationResult.from_outcomes(outcomes, refdata, *, id_prefix="seq", expose_provenance=False) Build a result from Rust Outcome objects + refdata; expose_provenance=True injects truth_*_call columns

Reports and exceptions

ValidationReport

report = result.validate_records(refdata)
report.ok        # bool — every record passed
report.count     # int — total records validated
report.failures  # list[dict] — per-record failure entries with structured issues
report.summary() # str — histogram of issue kinds
bool(report)     # == report.ok

FamilyValidationReport

report = result.validate_families()
report.ok                  # bool
report.count               # total records inspected
report.family_count        # int — distinct clone_ids found
report.members_per_family  # dict[int, int] — descendant count per clone
report.failures            # list[dict] — per-family failure entries
report.summary()           # histogram of issue kinds
bool(report)               # == report.ok

Same dataclass shape is returned by validate_families_with_parents(refdata).

CartridgeBuildReport

report = builder.report()           # or cfg.build_report
report.stages                       # list[dict] — one entry per builder call
report.warnings                     # list[str] — build-finalisation warnings
report.rejected                     # list[dict] — per-allele drops + per-row estimator rejections
report.manifest_snapshot            # dict | None — cartridge_manifest() at build time
report.checksum_at_build_time       # str | None — schema_sha256 stamped on cfg
report.to_dict()                    # JSON-clean dict for CI artifacts

See Build a reference cartridge for the full surface.

Exceptions

Exception Raised by When
StrictSamplingError run_records(..., strict=True) Empty admissible support at a sampling site
RecordValidationFailedError Strict record-validation code paths A validator gate fired in strict mode
FamilyValidationFailedError Strict family-validation code paths A family-validator gate fired in strict mode
DataConfigError DataConfig validation Cartridge spec rejected

StrictSamplingError is NOT a ValueError subclass — except ValueError will not catch it. Catch it explicitly:

try:
    result = exp.run_records(n=10, seed=42, strict=True)
except ga.StrictSamplingError as e:
    pass_name, address, reason = e.args

Generated SimulationResult

What an autogenerated section looks like (rendered via mkdocstrings from the docstrings in src/GenAIRR/result.py). One class today; the full reference fills in over follow-up slices.

GenAIRR.SimulationResult

List-like wrapper around a batch of AIRR records.

result[i] returns the i-th record dict; len(result) is the number of records; iteration yields records in order.

The original Outcome objects (with their full trace + revision history) are kept on .outcomes for advanced inspection — most users won't need them.

records property

The underlying list of record dicts. Mutation through this view propagates back into the result.

outcomes property

The underlying list of Outcome objects, or None when this :class:SimulationResult was built from records directly (e.g. loaded from a TSV).

parents property

Per-clone parent Outcome objects for clonal results; None for non-clonal results and for results built from records directly.

parents[c] is the recombination ancestor of clone c: every descendant record with record["clone_id"] == record["parent_id"] == c was produced by running the post-fork plan from this parent's :meth:final_simulation.

The parent Outcome carries the pre-fork addressed-choice .trace(), the pre-fork .events() ledger, the per-revision IR history (.revision(i)), and the final assembled IR (.final_simulation()). Use these for replay, lineage analysis, or building a parent-aware family validator (Slice 3+ scope).

The flat .outcomes list continues to carry only the descendant outcomes (one entry per AIRR record); parents live exclusively here. len(.parents) equals the clonal pipeline's n_clones; len(.outcomes) equals n_clones * per_clone.

validate_records(refdata)

Public AIRR output correctness check.

Run the postcondition validator over every record in this result and return a :class:ValidationReport. This is the gate a downstream consumer cares about: "is each projected AIRR record internally consistent with the engine state that produced it?"

Each record is re-derived independently from its original Outcome (trace + event ledger + final Simulation) and compared against the projected dict. A record passes when outcome.validate_record(refdata, sequence_id=...) returns an empty issue list. Failures collect the record_index, sequence_id, and the issue dicts.

Companion check — for engine-side integrity, see :meth:Outcome.check_live_call_cache_parity (returns the cached-vs-fresh divergence on the live-call cache that feeds projection).

Troubleshooting rule — if a CI run has both this validator AND the parity harness failing on the same batch, fix the parity divergence FIRST: a stale cache can leak into projection and produce spurious validator failures. Once parity is green, rerun the validator; remaining failures point at a real projection-layer bug.

refdata must be the same :class:RefDataConfig the outcomes were produced against; passing a different refdata will misreport mismatches against an unrelated allele pool.

Raises RuntimeError when this result was built without attached outcomes (e.g. loaded from a TSV); the validator needs the engine state, not just the projected record.

validate_families(refdata=None)

Clonal-family consistency check — a strict subset of the audit's §6 family-layer invariants (docs/clonal_family_design.md).

Groups records by clone_id and asserts the recombination- time truth fields agree across every descendant of a clone. refdata is reserved for forward compatibility with the deeper family-layer checks (pre-SHM junction, mutation- distance distribution) the audit's later slices add; this slice's invariants are all dict-only and ignore refdata.

Currently enforced invariants:

  • truth_v_call constant within each clone_id, when present. Skipped silently for clones whose records were projected without expose_provenance=True.
  • truth_d_call same.
  • truth_j_call same.
  • clone_id is present on every record once any record in the batch carries one (a batch that mixes clonal and non-clonal records raises CloneIdMissing).

Non-clonal results return ok with family_count == 0. This makes result.validate_families() a safe no-op on a flat batch — the call site does not need to branch on the result's clonal-ness.

Records-only results work. Unlike :meth:validate_records, this validator does not require the underlying Outcome objects — every check is on record-dict fields — so a SimulationResult loaded from TSV can still be family-validated.

Not enforced yet (deferred per the audit's §14 out-of-scope list): mutation-distance distribution, pre-SHM junction invariance, parent-trace reconstruction, lineage topology, original_v_call / d_inverted invariance.

Returns a :class:FamilyValidationReport carrying count, family_count, members_per_family, and failures.

to_dataframe(*, airr_strict=False)

Return a :class:pandas.DataFrame with one row per record.

airr_strict=True converts all 0-based half-open coord *_start fields to the AIRR-spec 1-based-inclusive form (*_end fields are unchanged). Useful when handing the DataFrame off to AIRR-strict downstream tooling.

Raises ImportError if pandas isn't installed (pandas is an optional extra: pip install GenAIRR[all]).

to_tsv(path, *, airr_strict=False)

Write the records as AIRR-style TSV (tab-separated). The header row uses :data:_DEFAULT_COLUMN_ORDER.

airr_strict=True converts coord *_start fields to 1-based-inclusive (AIRR spec).

What's not documented here yet

A few surfaces deliberately don't appear on this page:

  • Deep Rust engine internals. GenAIRR._engine exposes the PyO3 bindings (Outcome, TraceFile, RefDataConfig, Simulation, etc.). These are reachable from Python but are engine-developer surfaces; the user-facing surface above (SimulationResult.outcomes[i].trace(), compiled.simulator.replay_from_trace_file(...)) is what user code should touch.
  • Experimental / private helpers. Anything under GenAIRR._* is private; signatures and behaviour can change between releases without notice.
  • Pre-engine documentation. The historical _old_docs/ directory in the repository carries an earlier docs system that predates the Rust engine; it's preserved for reference but not migrated into the current site.

When you need the full per-symbol docstring inventory, the follow-up slice will land mkdocstrings-driven pages for Experiment, SimulationResult, ReferenceCartridgeBuilder, DataConfig, and the cartridge-spec dataclasses. Today, the generated SimulationResult section above is the only one.