API reference¶
A map of the public surface. The reference is organised by workflow — how you actually compose a simulation — rather than alphabetically. For each symbol, the user guide that explains how to use it is linked; the autogenerated per-symbol signatures are still being expanded.
How to read this reference¶
The page below is workflow-first, not signature-first. Each section starts with the user-facing surface — what to call, in what order — and links to the dedicated guide for context. Use this page to find the right symbol; use the guides to learn when and why to use it.
If you want the precise method signatures or the underlying docstrings, the generated SimulationResult section below shows what auto-rendered API docs look like (one example class today; the rest land in a follow-up slice).
Top-level imports¶
35 names ship in GenAIRR.__all__. Grouped by what they do:
Simulation entry points¶
| Symbol | Purpose | Guide |
|---|---|---|
Experiment |
Fluent pipeline builder | The Experiment builder |
CompiledExperiment |
Compiled plan (compile once, run many) | The Experiment builder |
SimulationResult |
Records + outcomes + parents wrapper | Quick start |
set_seed, get_seed, reset_seed |
Thread-local seed management | Validation hub |
__version__ |
Resolved via importlib.metadata; falls back to "0.0.0" for editable installs |
— |
Reference data — cartridges and bridges¶
| Symbol | Purpose | Guide |
|---|---|---|
DataConfig |
The Python-side cartridge dataclass | Reference cartridge |
RefDataConfig |
The engine-side refdata bridge object | Reference cartridge |
dataconfig_to_refdata |
Bridge function — DataConfig → RefDataConfig |
validate_records |
ConfigInfo |
Identity metadata | Reference cartridge |
ChainType |
BCR_HEAVY / BCR_LIGHT_KAPPA / ... |
— |
Species |
HUMAN / MOUSE / ... |
— |
Productivity |
Productive / NonProductive enumeration | — |
list_configs |
Enumerate the 100+ bundled cartridges | — |
DataConfigError |
DataConfig validation failure | — |
Bundled cartridges (lazy-loaded)¶
| Symbol | Locus / chain |
|---|---|
HUMAN_IGH_OGRDB |
Human IGH, OGRDB |
HUMAN_IGH_EXTENDED |
Human IGH, extended catalogue |
HUMAN_IGK_OGRDB |
Human IGK, OGRDB |
HUMAN_IGL_OGRDB |
Human IGL, OGRDB |
HUMAN_TCRB_IMGT |
Human TRB, IMGT |
The five top-level bundled cartridges above are the most commonly
used. The full 100+-cartridge catalogue stays reachable via
ga.list_configs() and GenAIRR.data.<NAME> (lazy attribute
access).
Reference cartridge authoring¶
| Symbol | Purpose | Guide |
|---|---|---|
ReferenceCartridgeBuilder |
Fluent builder for custom cartridges from FASTA | Build a reference cartridge |
CartridgeBuildReport |
Auditable build trail dataclass | Build a reference cartridge |
ReferenceRulesSpec |
Anchor + alphabet + severity rules | Reference cartridge |
AnchorRuleSpec |
Per-anchor expected-AA + required flag | Reference cartridge |
ReferenceEmpiricalModels |
Typed empirical-model bundle | Reference cartridge |
EmpiricalDistributionSpec |
[(value, weight), ...] shape |
Reference cartridge |
NpBaseModelSpec |
NP-base sampling model (uniform / empirical / Markov) | Junction N/P additions |
AlleleUsageSpec |
Per-segment allele-usage weights | Reference cartridge |
Validation reports and exceptions¶
| Symbol | Purpose | Guide |
|---|---|---|
ValidationReport |
Per-record AIRR-validator output | validate_records |
FamilyValidationReport |
Family-validator output | Clonal simulation overview |
RecordValidationFailedError |
Raised when strict record validation fails | validate_records |
FamilyValidationFailedError |
Raised when strict family validation fails | Clonal simulation overview |
StrictSamplingError |
Raised under strict=True on empty admissible support |
Validation hub |
productive |
Convenience contract bundle for productive sampling | Recombination biology |
Experiment¶
The fluent pipeline builder. Every chained method returns the
same Experiment extended by one more pass; the pipeline runs
when .run_records(...) / .run(...) / .compile().run(...)
is called.
| Surface | Methods | Notes |
|---|---|---|
| Bind to a cartridge | Experiment.on(cfg_or_name) |
Accepts a string shortcut, a DataConfig, or a RefDataConfig |
| Recombination | .recombine(), .invert_d(prob=...), .receptor_revision(prob=...) |
Ancestor-phase mechanisms |
| Constraints | .productive_only(), .restrict_alleles(v=..., d=..., j=...) |
Constrain the sample space |
| Biological mutation | .mutate(model="s5f"\|"uniform", rate=..., count=..., segment_rates=..., v_subregion_rates=...) |
The only pass that increments n_mutations |
| Clonal structure | .clonal_lineage(...), .clonal_repertoire(...), legacy .expand_clones(...) |
BCR trees, TCR / flat-BCR abundance repertoires, or fixed-size star families |
| Trims override | .trim(v_3=..., d_5=..., d_3=..., j_5=..., enabled=...) |
Per-experiment trim distribution overrides |
| Library / sequencer artefacts | .pcr_amplify(...), .polymerase_indels(...), .ambiguous_base_calls(...), .sequencing_errors(...), .end_loss_5prime(...), .end_loss_3prime(...) |
All descendant-phase |
| Read layout | .paired_end(r1_length=..., r2_length=..., insert_size=...), .random_strand_orientation(prob=...) |
Per-read projection |
| Bookkeeping | .with_metadata(...), .contaminate(prob=...) |
Stamp metadata; inject background contaminants |
| Run | .run_records(n=..., seed=..., validate_records=..., strict=..., expose_provenance=...), .run(...), .stream(...), .stream_records(...) |
Compile + run + project |
| Compile reuse | .compile() |
Compile once, reuse across many batches |
| Compile mode | .allow_curatable_refdata(), .curate_refdata(policy) |
Cartridge-validation mode at compile time |
See The Experiment builder for the full pipeline-stage map and ordering rules.
SimulationResult¶
The output of run_records(...) — a list-like wrapper around AIRR
record dicts plus the underlying Outcome objects.
| Surface | Methods / properties | Notes |
|---|---|---|
| List-like access | len(result), result[i], result[a:b], for rec in result: |
One AIRR dict per element |
| Underlying state | .records, .outcomes, .parents, .lineage_trees |
outcomes is None when built from records only; parents exists only for legacy expand_clones; lineage_trees exists on lineage results |
| Validation | .validate_records(refdata), .validate_families(), .validate_families_with_parents(refdata) |
See Validation hub |
| Export | .to_tsv(path, *, airr_strict=False), .to_csv(path, *, airr_strict=False), .to_fasta(path, *, prefix="seq"), .to_fastq(path, *, quality="illumina", **kw), .to_paired_fastq(r1, r2, *, quality="illumina", overwrite=False, **kw), .to_dataframe(*, airr_strict=False) |
See Export the results |
| Construction | SimulationResult.from_outcomes(outcomes, refdata, *, id_prefix="seq", expose_provenance=False) |
Build a result from Rust Outcome objects + refdata; expose_provenance=True injects truth_*_call columns |
Reports and exceptions¶
ValidationReport¶
report = result.validate_records(refdata)
report.ok # bool — every record passed
report.count # int — total records validated
report.failures # list[dict] — per-record failure entries with structured issues
report.summary() # str — histogram of issue kinds
bool(report) # == report.ok
FamilyValidationReport¶
report = result.validate_families()
report.ok # bool
report.count # total records inspected
report.family_count # int — distinct clone_ids found
report.members_per_family # dict[int, int] — descendant count per clone
report.failures # list[dict] — per-family failure entries
report.summary() # histogram of issue kinds
bool(report) # == report.ok
Same dataclass shape is returned by validate_families_with_parents(refdata).
CartridgeBuildReport¶
report = builder.report() # or cfg.build_report
report.stages # list[dict] — one entry per builder call
report.warnings # list[str] — build-finalisation warnings
report.rejected # list[dict] — per-allele drops + per-row estimator rejections
report.manifest_snapshot # dict | None — cartridge_manifest() at build time
report.checksum_at_build_time # str | None — schema_sha256 stamped on cfg
report.to_dict() # JSON-clean dict for CI artifacts
See Build a reference cartridge for the full surface.
Exceptions¶
| Exception | Raised by | When |
|---|---|---|
StrictSamplingError |
run_records(..., strict=True) |
Empty admissible support at a sampling site |
RecordValidationFailedError |
Strict record-validation code paths | A validator gate fired in strict mode |
FamilyValidationFailedError |
Strict family-validation code paths | A family-validator gate fired in strict mode |
DataConfigError |
DataConfig validation |
Cartridge spec rejected |
StrictSamplingError is NOT a ValueError subclass — except
ValueError will not catch it. Catch it explicitly:
try:
result = exp.run_records(n=10, seed=42, strict=True)
except ga.StrictSamplingError as e:
pass_name, address, reason = e.args
Generated SimulationResult¶
What an autogenerated section looks like (rendered via
mkdocstrings from the docstrings in src/GenAIRR/result.py).
One class today; the full reference fills in over follow-up
slices.
GenAIRR.SimulationResult
¶
List-like wrapper around a batch of AIRR records.
result[i] returns the i-th record dict; len(result) is
the number of records; iteration yields records in order.
The original Outcome objects (with their full trace +
revision history) are kept on .outcomes for advanced
inspection — most users won't need them.
records
property
¶
The underlying list of record dicts. Mutation through this view propagates back into the result.
outcomes
property
¶
The underlying list of Outcome objects, or None
when this :class:SimulationResult was built from records
directly (e.g. loaded from a TSV).
parents
property
¶
Per-clone parent Outcome objects for clonal results;
None for non-clonal results and for results built from
records directly.
parents[c] is the recombination ancestor of clone c:
every descendant record with
record["clone_id"] == record["parent_id"] == c was
produced by running the post-fork plan from this parent's
:meth:final_simulation.
The parent Outcome carries the pre-fork addressed-choice
.trace(), the pre-fork .events() ledger, the
per-revision IR history (.revision(i)), and the final
assembled IR (.final_simulation()). Use these for
replay, lineage analysis, or building a parent-aware family
validator (Slice 3+ scope).
The flat .outcomes list continues to carry only the
descendant outcomes (one entry per AIRR record); parents
live exclusively here. len(.parents) equals the clonal
pipeline's n_clones; len(.outcomes) equals
n_clones * per_clone.
validate_records(refdata)
¶
Public AIRR output correctness check.
Run the postcondition validator over every record in this
result and return a :class:ValidationReport. This is the
gate a downstream consumer cares about: "is each projected
AIRR record internally consistent with the engine state that
produced it?"
Each record is re-derived independently from its original
Outcome (trace + event ledger + final Simulation)
and compared against the projected dict. A record passes
when outcome.validate_record(refdata, sequence_id=...)
returns an empty issue list. Failures collect the
record_index, sequence_id, and the issue dicts.
Companion check — for engine-side integrity, see
:meth:Outcome.check_live_call_cache_parity (returns the
cached-vs-fresh divergence on the live-call cache that
feeds projection).
Troubleshooting rule — if a CI run has both this validator AND the parity harness failing on the same batch, fix the parity divergence FIRST: a stale cache can leak into projection and produce spurious validator failures. Once parity is green, rerun the validator; remaining failures point at a real projection-layer bug.
refdata must be the same :class:RefDataConfig the
outcomes were produced against; passing a different refdata
will misreport mismatches against an unrelated allele pool.
Raises RuntimeError when this result was built without
attached outcomes (e.g. loaded from a TSV); the validator
needs the engine state, not just the projected record.
validate_families(refdata=None)
¶
Clonal-family consistency check — a strict subset of
the audit's §6 family-layer invariants
(docs/clonal_family_design.md).
Groups records by clone_id and asserts the recombination-
time truth fields agree across every descendant of a clone.
refdata is reserved for forward compatibility with the
deeper family-layer checks (pre-SHM junction, mutation-
distance distribution) the audit's later slices add; this
slice's invariants are all dict-only and ignore refdata.
Currently enforced invariants:
truth_v_callconstant within eachclone_id, when present. Skipped silently for clones whose records were projected withoutexpose_provenance=True.truth_d_callsame.truth_j_callsame.clone_idis present on every record once any record in the batch carries one (a batch that mixes clonal and non-clonal records raisesCloneIdMissing).
Non-clonal results return ok with family_count == 0.
This makes result.validate_families() a safe no-op on a
flat batch — the call site does not need to branch on the
result's clonal-ness.
Records-only results work. Unlike
:meth:validate_records, this validator does not require
the underlying Outcome objects — every check is on
record-dict fields — so a SimulationResult loaded from
TSV can still be family-validated.
Not enforced yet (deferred per the audit's §14
out-of-scope list): mutation-distance distribution, pre-SHM
junction invariance, parent-trace reconstruction, lineage
topology, original_v_call / d_inverted invariance.
Returns a :class:FamilyValidationReport carrying
count, family_count, members_per_family, and
failures.
to_dataframe(*, airr_strict=False)
¶
Return a :class:pandas.DataFrame with one row per record.
airr_strict=True converts all 0-based half-open coord
*_start fields to the AIRR-spec 1-based-inclusive form
(*_end fields are unchanged). Useful when handing the
DataFrame off to AIRR-strict downstream tooling.
Raises ImportError if pandas isn't installed (pandas is
an optional extra: pip install GenAIRR[all]).
to_tsv(path, *, airr_strict=False)
¶
Write the records as AIRR-style TSV (tab-separated). The
header row uses :data:_DEFAULT_COLUMN_ORDER.
airr_strict=True converts coord *_start fields to
1-based-inclusive (AIRR spec).
What's not documented here yet¶
A few surfaces deliberately don't appear on this page:
- Deep Rust engine internals.
GenAIRR._engineexposes the PyO3 bindings (Outcome,TraceFile,RefDataConfig,Simulation, etc.). These are reachable from Python but are engine-developer surfaces; the user-facing surface above (SimulationResult.outcomes[i].trace(),compiled.simulator.replay_from_trace_file(...)) is what user code should touch. - Experimental / private helpers. Anything under
GenAIRR._*is private; signatures and behaviour can change between releases without notice. - Pre-engine documentation. The historical
_old_docs/directory in the repository carries an earlier docs system that predates the Rust engine; it's preserved for reference but not migrated into the current site.
When you need the full per-symbol docstring inventory, the
follow-up slice will land mkdocstrings-driven pages for
Experiment, SimulationResult, ReferenceCartridgeBuilder,
DataConfig, and the cartridge-spec dataclasses. Today, the
generated SimulationResult section
above is the only one.