Skip to content

SimulationResult

SimulationResult is the output wrapper returned by Experiment.run_records(...). It holds the list of AIRR record dicts, the underlying engine Outcome objects (for trace / replay / validation), legacy parent Outcomes when expand_clones produced fixed-size families, and lineage trees when clonal_lineage produced BCR tree output. Treat it as a list-like view of records plus the typed validators and export helpers below.

Common methods

The eight methods you'll reach for in real pipelines:

Method Purpose
.validate_records(refdata) Per-record AIRR-output correctness gate
.validate_families() Clonal family consistency gate (groups by clone_id)
.validate_families_with_parents(refdata=None) Parent-aware family validator
.to_dataframe(*, airr_strict=False) Return a pandas DataFrame with canonical AIRR columns
.to_tsv(path, *, airr_strict=False) Write AIRR-style TSV with the canonical header
.to_csv(path, *, airr_strict=False) Write CSV (sibling of to_tsv)
.to_fasta(path, *, prefix="seq") Write assembled sequences as FASTA
.to_fastq(...) / .to_paired_fastq(...) Write FASTQ; paired-end requires read_layout="paired_end"

clonal_repertoire returns ordinary SimulationResult records with clone_id and duplicate_count. clonal_lineage returns SimulationResultWithLineages, a subclass that adds .lineage_trees. Legacy expand_clones returns SimulationResult with .parents.

FASTQ exports (prose only)

The two FASTQ-emitting methods are documented in Export the results and Paired-end reads and FASTQ, and are intentionally omitted from the generated block below because their **quality_kwargs parameter is untyped (griffe rejects it under strict mode). Their public signatures:

result.to_fastq(
    path: str,
    *,
    quality: str = "illumina",       # "illumina" (trapezoid) or "constant"
    prefix: str = "seq",
    **quality_kwargs,                # see paired-end-fastq.md
) -> None

result.to_paired_fastq(
    r1_path: str,
    r2_path: str,
    *,
    quality: str = "illumina",
    overwrite: bool = False,
    **quality_kwargs,
) -> None

to_paired_fastq requires the experiment to have included .paired_end(r1_length=..., r2_length=..., insert_size=...) — otherwise it raises. It also refuses to overwrite an existing output file unless you pass overwrite=True.

Class reference

GenAIRR.result.SimulationResult

List-like wrapper around a batch of AIRR records.

result[i] returns the i-th record dict; len(result) is the number of records; iteration yields records in order.

The original Outcome objects (with their full trace + revision history) are kept on .outcomes for advanced inspection — most users won't need them.

records property

The underlying list of record dicts. Mutation through this view propagates back into the result.

outcomes property

The underlying list of Outcome objects, or None when this :class:SimulationResult was built from records directly (e.g. loaded from a TSV).

parents property

Per-clone parent Outcome objects for clonal results; None for non-clonal results and for results built from records directly.

parents[c] is the recombination ancestor of clone c: every descendant record with record["clone_id"] == record["parent_id"] == c was produced by running the post-fork plan from this parent's :meth:final_simulation.

The parent Outcome carries the pre-fork addressed-choice .trace(), the pre-fork .events() ledger, the per-revision IR history (.revision(i)), and the final assembled IR (.final_simulation()). Use these for replay, lineage analysis, or building a parent-aware family validator (Slice 3+ scope).

The flat .outcomes list continues to carry only the descendant outcomes (one entry per AIRR record); parents live exclusively here. len(.parents) equals the clonal pipeline's n_clones; len(.outcomes) equals n_clones * per_clone.

validate_records(refdata)

Public AIRR output correctness check.

Run the postcondition validator over every record in this result and return a :class:ValidationReport. This is the gate a downstream consumer cares about: "is each projected AIRR record internally consistent with the engine state that produced it?"

Each record is re-derived independently from its original Outcome (trace + event ledger + final Simulation) and compared against the projected dict. A record passes when outcome.validate_record(refdata, sequence_id=...) returns an empty issue list. Failures collect the record_index, sequence_id, and the issue dicts.

Companion check — for engine-side integrity, see :meth:Outcome.check_live_call_cache_parity (returns the cached-vs-fresh divergence on the live-call cache that feeds projection).

Troubleshooting rule — if a CI run has both this validator AND the parity harness failing on the same batch, fix the parity divergence FIRST: a stale cache can leak into projection and produce spurious validator failures. Once parity is green, rerun the validator; remaining failures point at a real projection-layer bug.

refdata must be the same :class:RefDataConfig the outcomes were produced against; passing a different refdata will misreport mismatches against an unrelated allele pool.

Raises RuntimeError when this result was built without attached outcomes (e.g. loaded from a TSV); the validator needs the engine state, not just the projected record.

validate_families(refdata=None)

Clonal-family consistency check — a strict subset of the audit's §6 family-layer invariants (docs/clonal_family_design.md).

Groups records by clone_id and asserts the recombination- time truth fields agree across every descendant of a clone. refdata is reserved for forward compatibility with the deeper family-layer checks (pre-SHM junction, mutation- distance distribution) the audit's later slices add; this slice's invariants are all dict-only and ignore refdata.

Currently enforced invariants:

  • truth_v_call constant within each clone_id, when present. Skipped silently for clones whose records were projected without expose_provenance=True.
  • truth_d_call same.
  • truth_j_call same.
  • clone_id is present on every record once any record in the batch carries one (a batch that mixes clonal and non-clonal records raises CloneIdMissing).

Non-clonal results return ok with family_count == 0. This makes result.validate_families() a safe no-op on a flat batch — the call site does not need to branch on the result's clonal-ness.

Records-only results work. Unlike :meth:validate_records, this validator does not require the underlying Outcome objects — every check is on record-dict fields — so a SimulationResult loaded from TSV can still be family-validated.

Not enforced yet (deferred per the audit's §14 out-of-scope list): mutation-distance distribution, pre-SHM junction invariance, parent-trace reconstruction, lineage topology, original_v_call / d_inverted invariance.

Returns a :class:FamilyValidationReport carrying count, family_count, members_per_family, and failures.

validate_families_with_parents(refdata=None)

Parent-aware clonal-family validator — the deeper diagnostic that compares every descendant against its actual parent Outcome (Slice 3 of the clonal-family audit; see docs/clonal_parent_outcome_design.md §6).

Sibling of :meth:validate_families. That validator is record-only (groups by clone_id, compares truth fields across siblings); this one requires the parent outcomes to be available on the result and compares each descendant against its parent directly. Use this when you want to confirm "the descendants reflect the recombination ancestor they came from," not just "siblings agree with each other."

Currently enforced invariants — all derived from record-vs-parent comparison only:

  • Structural:
  • ParentsMissing when records carry clone_id / parent_id but self.parents is None.
  • ParentIdMissing for records without a non-null parent_id in a result that has parents available.
  • ParentIdOutOfRange when record["parent_id"] is not in [0, len(self.parents)).
  • Truth-allele consistency (requires refdata):
  • ParentTruthVCallMismatch / ParentTruthDCallMismatch / ParentTruthJCallMismatch — descendant's truth_*_call (from expose_provenance=True) disagrees with the parent's projected truth allele.
  • Provenance consistency (no refdata needed):
  • ParentDInvertedMismatch — descendant's d_inverted disagrees with the parent's. D inversion is a pre-fork decision, so divergence indicates a structural bug.
  • ParentOriginalVCallMismatch — descendant's original_v_call (receptor-revision provenance) disagrees with the parent's. Same reasoning.

Without refdata: only the structural checks (ParentsMissing / ParentIdMissing / ParentIdOutOfRange) run. All value comparisons — truth alleles, d_inverted, original_v_call — require projecting the parent Outcome to an AIRR record, which today goes through the Rust projector and needs refdata. Slice 3 deliberately stays Python-only; a lighter-weight refdata-free parent accessor for d_inverted etc. is deferred until a Rust slice surfaces one.

Skipped silently for fields not present on descendants: if expose_provenance was off, the descendants don't carry truth_*_call and those checks are no-ops.

Non-clonal results return ok with family_count=0 — same safe no-op shape as :meth:validate_families. Slice 3 deliberately does NOT raise "not clonal" here.

Not enforced yet (deferred per audit §6, §14):

  • Pre-SHM junction invariance. The descendant's junction AIRR field is post-SHM; the pre-SHM junction lives only inside the parent's IR. A proper check would require a parent-derived junction_pre_shm field on either records or a future FamilyRecord projection (Slice 4+). This validator does NOT compare descendant.junction against any parent-derived value today.
  • Mutation-distance distribution. Comparing the parent's assembled sequence to each descendant's post-SHM sequence to verify SHM mass is plausible. Requires projecting the parent's pool to a sequence string — out of scope for this slice.
  • Plan-split pre-fork pass enumeration. "Parent should not carry descendant-only observation fields like PCR / paired-end / quality errors" is pinned at the contract-test level (the pre-fork plan's pass names) rather than enforced at runtime here.

Not wired into validate_records=True. This is an explicit deeper diagnostic surface. The validate_records=True gate continues to run only the per-record postcondition validator and the field-only :meth:validate_families. Callers who want parent-aware checks invoke this method explicitly.

Returns a :class:FamilyValidationReport. Failure dicts carry clone_id, parent_id, record_indices, issue_kind, parent_value, and child_values (the latter two are None / [] for structural failures that don't compare values).

to_dataframe(*, airr_strict=False)

Return a :class:pandas.DataFrame with one row per record.

airr_strict=True converts all 0-based half-open coord *_start fields to the AIRR-spec 1-based-inclusive form (*_end fields are unchanged). Useful when handing the DataFrame off to AIRR-strict downstream tooling.

Raises ImportError if pandas isn't installed (pandas is an optional extra: pip install GenAIRR[all]).

to_tsv(path, *, airr_strict=False)

Write the records as AIRR-style TSV (tab-separated). The header row uses :data:_DEFAULT_COLUMN_ORDER.

airr_strict=True converts coord *_start fields to 1-based-inclusive (AIRR spec).

to_csv(path, *, airr_strict=False)

Write the records as comma-separated values. Convenience alongside :meth:to_tsv — most analysis tooling prefers TSV for AIRR data.

airr_strict=True converts coord *_start fields to 1-based-inclusive (AIRR spec).

to_fasta(path, *, prefix='seq')

Write the assembled sequences as FASTA. Each record gets a header of the form ">{prefix}{i}|v_call=...|j_call=...".

from_outcomes(outcomes, refdata, *, id_prefix='seq', expose_provenance=False) classmethod

Build a :class:SimulationResult from a list of Rust Outcome objects + the refdata they ran against.

Each record's sequence_id is set to f"{id_prefix}{i}" (e.g. seq0, seq1, …) so AIRR-format consumers see a unique per-row identifier out of the box.

expose_provenance=True adds truth_v_call, truth_d_call, truth_j_call columns containing the originally-sampled allele names — distinct from the evidence-driven v_call / d_call / j_call fields, which reflect what an aligner would see. Pair them at the Python level to compute aligner-vs-truth accuracy without a side truth file.