SimulationResult¶
SimulationResult is the output wrapper
returned by Experiment.run_records(...). It holds the
list of AIRR record dicts, the underlying engine Outcome
objects (for trace / replay / validation), legacy parent
Outcomes when expand_clones produced fixed-size
families, and lineage trees when clonal_lineage produced BCR
tree output.
Treat it as a list-like view of records plus the typed validators
and export helpers below.
Common methods¶
The eight methods you'll reach for in real pipelines:
| Method | Purpose |
|---|---|
.validate_records(refdata) |
Per-record AIRR-output correctness gate |
.validate_families() |
Clonal family consistency gate (groups by clone_id) |
.validate_families_with_parents(refdata=None) |
Parent-aware family validator |
.to_dataframe(*, airr_strict=False) |
Return a pandas DataFrame with canonical AIRR columns |
.to_tsv(path, *, airr_strict=False) |
Write AIRR-style TSV with the canonical header |
.to_csv(path, *, airr_strict=False) |
Write CSV (sibling of to_tsv) |
.to_fasta(path, *, prefix="seq") |
Write assembled sequences as FASTA |
.to_fastq(...) / .to_paired_fastq(...) |
Write FASTQ; paired-end requires read_layout="paired_end" |
clonal_repertoire returns ordinary SimulationResult records with
clone_id and duplicate_count. clonal_lineage returns
SimulationResultWithLineages, a subclass that adds .lineage_trees.
Legacy expand_clones returns SimulationResult with .parents.
FASTQ exports (prose only)¶
The two FASTQ-emitting methods are documented in
Export the results and
Paired-end reads and FASTQ, and
are intentionally omitted from the generated block below because
their **quality_kwargs parameter is untyped (griffe rejects it
under strict mode). Their public signatures:
result.to_fastq(
path: str,
*,
quality: str = "illumina", # "illumina" (trapezoid) or "constant"
prefix: str = "seq",
**quality_kwargs, # see paired-end-fastq.md
) -> None
result.to_paired_fastq(
r1_path: str,
r2_path: str,
*,
quality: str = "illumina",
overwrite: bool = False,
**quality_kwargs,
) -> None
to_paired_fastq requires the experiment to have included
.paired_end(r1_length=..., r2_length=..., insert_size=...) —
otherwise it raises. It also refuses to overwrite an existing
output file unless you pass overwrite=True.
Class reference¶
GenAIRR.result.SimulationResult
¶
List-like wrapper around a batch of AIRR records.
result[i] returns the i-th record dict; len(result) is
the number of records; iteration yields records in order.
The original Outcome objects (with their full trace +
revision history) are kept on .outcomes for advanced
inspection — most users won't need them.
records
property
¶
The underlying list of record dicts. Mutation through this view propagates back into the result.
outcomes
property
¶
The underlying list of Outcome objects, or None
when this :class:SimulationResult was built from records
directly (e.g. loaded from a TSV).
parents
property
¶
Per-clone parent Outcome objects for clonal results;
None for non-clonal results and for results built from
records directly.
parents[c] is the recombination ancestor of clone c:
every descendant record with
record["clone_id"] == record["parent_id"] == c was
produced by running the post-fork plan from this parent's
:meth:final_simulation.
The parent Outcome carries the pre-fork addressed-choice
.trace(), the pre-fork .events() ledger, the
per-revision IR history (.revision(i)), and the final
assembled IR (.final_simulation()). Use these for
replay, lineage analysis, or building a parent-aware family
validator (Slice 3+ scope).
The flat .outcomes list continues to carry only the
descendant outcomes (one entry per AIRR record); parents
live exclusively here. len(.parents) equals the clonal
pipeline's n_clones; len(.outcomes) equals
n_clones * per_clone.
validate_records(refdata)
¶
Public AIRR output correctness check.
Run the postcondition validator over every record in this
result and return a :class:ValidationReport. This is the
gate a downstream consumer cares about: "is each projected
AIRR record internally consistent with the engine state that
produced it?"
Each record is re-derived independently from its original
Outcome (trace + event ledger + final Simulation)
and compared against the projected dict. A record passes
when outcome.validate_record(refdata, sequence_id=...)
returns an empty issue list. Failures collect the
record_index, sequence_id, and the issue dicts.
Companion check — for engine-side integrity, see
:meth:Outcome.check_live_call_cache_parity (returns the
cached-vs-fresh divergence on the live-call cache that
feeds projection).
Troubleshooting rule — if a CI run has both this validator AND the parity harness failing on the same batch, fix the parity divergence FIRST: a stale cache can leak into projection and produce spurious validator failures. Once parity is green, rerun the validator; remaining failures point at a real projection-layer bug.
refdata must be the same :class:RefDataConfig the
outcomes were produced against; passing a different refdata
will misreport mismatches against an unrelated allele pool.
Raises RuntimeError when this result was built without
attached outcomes (e.g. loaded from a TSV); the validator
needs the engine state, not just the projected record.
validate_families(refdata=None)
¶
Clonal-family consistency check — a strict subset of
the audit's §6 family-layer invariants
(docs/clonal_family_design.md).
Groups records by clone_id and asserts the recombination-
time truth fields agree across every descendant of a clone.
refdata is reserved for forward compatibility with the
deeper family-layer checks (pre-SHM junction, mutation-
distance distribution) the audit's later slices add; this
slice's invariants are all dict-only and ignore refdata.
Currently enforced invariants:
truth_v_callconstant within eachclone_id, when present. Skipped silently for clones whose records were projected withoutexpose_provenance=True.truth_d_callsame.truth_j_callsame.clone_idis present on every record once any record in the batch carries one (a batch that mixes clonal and non-clonal records raisesCloneIdMissing).
Non-clonal results return ok with family_count == 0.
This makes result.validate_families() a safe no-op on a
flat batch — the call site does not need to branch on the
result's clonal-ness.
Records-only results work. Unlike
:meth:validate_records, this validator does not require
the underlying Outcome objects — every check is on
record-dict fields — so a SimulationResult loaded from
TSV can still be family-validated.
Not enforced yet (deferred per the audit's §14
out-of-scope list): mutation-distance distribution, pre-SHM
junction invariance, parent-trace reconstruction, lineage
topology, original_v_call / d_inverted invariance.
Returns a :class:FamilyValidationReport carrying
count, family_count, members_per_family, and
failures.
validate_families_with_parents(refdata=None)
¶
Parent-aware clonal-family validator — the deeper
diagnostic that compares every descendant against its
actual parent Outcome (Slice 3 of the clonal-family
audit; see docs/clonal_parent_outcome_design.md §6).
Sibling of :meth:validate_families. That validator is
record-only (groups by clone_id, compares truth fields
across siblings); this one requires the parent outcomes
to be available on the result and compares each
descendant against its parent directly. Use this when you
want to confirm "the descendants reflect the recombination
ancestor they came from," not just "siblings agree with
each other."
Currently enforced invariants — all derived from record-vs-parent comparison only:
- Structural:
ParentsMissingwhen records carryclone_id/parent_idbutself.parentsisNone.ParentIdMissingfor records without a non-nullparent_idin a result that has parents available.ParentIdOutOfRangewhenrecord["parent_id"]is not in[0, len(self.parents)).- Truth-allele consistency (requires
refdata): ParentTruthVCallMismatch/ParentTruthDCallMismatch/ParentTruthJCallMismatch— descendant'struth_*_call(fromexpose_provenance=True) disagrees with the parent's projected truth allele.- Provenance consistency (no
refdataneeded): ParentDInvertedMismatch— descendant'sd_inverteddisagrees with the parent's. D inversion is a pre-fork decision, so divergence indicates a structural bug.ParentOriginalVCallMismatch— descendant'soriginal_v_call(receptor-revision provenance) disagrees with the parent's. Same reasoning.
Without refdata: only the structural checks
(ParentsMissing / ParentIdMissing /
ParentIdOutOfRange) run. All value comparisons —
truth alleles, d_inverted, original_v_call —
require projecting the parent Outcome to an AIRR
record, which today goes through the Rust projector and
needs refdata. Slice 3 deliberately stays Python-only;
a lighter-weight refdata-free parent accessor for
d_inverted etc. is deferred until a Rust slice surfaces
one.
Skipped silently for fields not present on descendants:
if expose_provenance was off, the descendants don't
carry truth_*_call and those checks are no-ops.
Non-clonal results return ok with family_count=0 —
same safe no-op shape as :meth:validate_families. Slice
3 deliberately does NOT raise "not clonal" here.
Not enforced yet (deferred per audit §6, §14):
- Pre-SHM junction invariance. The descendant's
junctionAIRR field is post-SHM; the pre-SHM junction lives only inside the parent's IR. A proper check would require a parent-derivedjunction_pre_shmfield on either records or a futureFamilyRecordprojection (Slice 4+). This validator does NOT comparedescendant.junctionagainst any parent-derived value today. - Mutation-distance distribution. Comparing the parent's assembled sequence to each descendant's post-SHM sequence to verify SHM mass is plausible. Requires projecting the parent's pool to a sequence string — out of scope for this slice.
- Plan-split pre-fork pass enumeration. "Parent should not carry descendant-only observation fields like PCR / paired-end / quality errors" is pinned at the contract-test level (the pre-fork plan's pass names) rather than enforced at runtime here.
Not wired into validate_records=True. This is an
explicit deeper diagnostic surface. The
validate_records=True gate continues to run only the
per-record postcondition validator and the field-only
:meth:validate_families. Callers who want parent-aware
checks invoke this method explicitly.
Returns a :class:FamilyValidationReport. Failure dicts
carry clone_id, parent_id, record_indices,
issue_kind, parent_value, and child_values
(the latter two are None / [] for structural
failures that don't compare values).
to_dataframe(*, airr_strict=False)
¶
Return a :class:pandas.DataFrame with one row per record.
airr_strict=True converts all 0-based half-open coord
*_start fields to the AIRR-spec 1-based-inclusive form
(*_end fields are unchanged). Useful when handing the
DataFrame off to AIRR-strict downstream tooling.
Raises ImportError if pandas isn't installed (pandas is
an optional extra: pip install GenAIRR[all]).
to_tsv(path, *, airr_strict=False)
¶
Write the records as AIRR-style TSV (tab-separated). The
header row uses :data:_DEFAULT_COLUMN_ORDER.
airr_strict=True converts coord *_start fields to
1-based-inclusive (AIRR spec).
to_csv(path, *, airr_strict=False)
¶
Write the records as comma-separated values. Convenience
alongside :meth:to_tsv — most analysis tooling prefers TSV
for AIRR data.
airr_strict=True converts coord *_start fields to
1-based-inclusive (AIRR spec).
to_fasta(path, *, prefix='seq')
¶
Write the assembled sequences as FASTA. Each record gets
a header of the form ">{prefix}{i}|v_call=...|j_call=...".
from_outcomes(outcomes, refdata, *, id_prefix='seq', expose_provenance=False)
classmethod
¶
Build a :class:SimulationResult from a list of Rust
Outcome objects + the refdata they ran against.
Each record's sequence_id is set to f"{id_prefix}{i}"
(e.g. seq0, seq1, …) so AIRR-format consumers see a
unique per-row identifier out of the box.
expose_provenance=True adds truth_v_call,
truth_d_call, truth_j_call columns containing the
originally-sampled allele names — distinct from the
evidence-driven v_call / d_call / j_call fields,
which reflect what an aligner would see. Pair them at the
Python level to compute aligner-vs-truth accuracy without a
side truth file.