FlashBackGraph¶
A strictly Markovian graph built from the FlashBack decomposition, where nodes are FlashBack tokens and edges are transitions between consecutive tokens. Because the graph is a Markovian DAG, every analytic (path count, diversity, entropy, Hill numbers, PGEN) is computed exactly via forward dynamic programming, with no Monte Carlo approximation.
For the conceptual distinction between FlashBackGraph and the LZGraph family (aap / ndp / naive), see Graph Variants.
Constructor¶
Parameters¶
| Parameter | Type | Description |
|---|---|---|
sequences |
list[str] |
Sequences to build from (must be non-empty) |
abundances |
list[int] |
Optional per-sequence counts (default: all 1) |
smoothing |
float |
Laplace smoothing alpha for edge weights (default: 0.0) |
from_file (classmethod)¶
Build directly from a plain-text file with constant memory (streaming). Accepts one sequence per line, orsequence<TAB>abundance.
Core Methods¶
pgen¶
Exact generation probability of one or more sequences under the FlashBack model.- Parameters:
sequence(str or list[str]),log(bool) - Returns:
float(single) ornp.ndarray(list)
simulate¶
Generaten sequences by Markov random walk.
- Returns: SimulationResult
top_k_sequences¶
Find the K most (or least) probable sequences exactly, via forward DP over the DAG's topological order, with no simulation. Setmost_probable=False for the lowest-probability tail.
- Returns: SimulationResult
posterior¶
Bayesian posterior graph using this graph as a Dirichlet-Multinomial prior. Keeps the prior topology and reweights edges toward the new individual.kappa=0 → pure individual, kappa→∞ → pure prior. Edges absent from the prior are ignored.
- Returns: A new
FlashBackGraphinstance.
without¶
Return a new graph with the contribution ofsequences removed: each walk's edge counts are decremented (clamped at zero, zero-count edges pruned, weights renormalised). Enables leave-donor-out construction from an existing graph in seconds rather than rebuilding from source.
- Returns: A new
FlashBackGraphinstance.
Exact Diversity & Analytics¶
All values below are computed exactly by forward DP over the DAG.
effective_diversity¶
Exact \(e^H\) (Shannon entropy), equivalent to Hill number \(D(1)\).diversity_profile¶
Full Shannon diversity breakdown (cached per instance).hill_number¶
Exact Hill diversity number \(D(\alpha)\).hill_numbers¶
Exact Hill numbers for multiple orders. Returns annp.ndarray.
hill_curve¶
Returns a dict withorders and values for plotting a diversity profile (default orders span 0–10).
power_sum¶
Exact power sum \(M(\alpha)\).pgen_diagnostics¶
Exact absorbed vs leaked probability mass.pgen_dynamic_range¶
Exact dynamic range of generation probabilities, in orders of magnitude. Usepgen_dynamic_range_detail() for the full breakdown.
pgen_moments¶
Moments (mean, variance, std, skewness, kurtosis) of the forward-DP log-PGEN distribution.
pgen_distribution¶
Analytical Gaussian-mixture approximation of the log-PGEN distribution. Returns a PgenDistribution.Anomaly Scoring (SCALE)¶
SCALE is the recommended anomaly/error score: a self-calibrated,
length-invariant transform of -log pgen. You calibrate once against the
graph, then score sequences (higher = more anomalous). See the
anomaly-detection tutorial.
calibrate_scale¶
Self-calibrate SCALE: simulaten_sim sequences from the graph and record the
per-length median and IQR of -log pgen. Returns a ScaleCalibration (the reusable cache, documented below).
- Parameters:
n_sim(int),seed(int or None),min_count(int, minimum simulated sequences at a length to get its own median/IQR; sparser lengths fall back to global). - Returns:
ScaleCalibration.
scale_score¶
Length-calibrated anomaly score:(-log pgen(s) - median[len]) / IQR[len],
using a ScaleCalibration. Higher means more anomalous.
- Parameters:
sequence(str or list[str]),calibration(ScaleCalibration). - Returns:
float(single) ornp.ndarray(list).
ScaleCalibration¶
The calibration cache returned by calibrate_scale. Holds median_by_length,
iqr_by_length, global_median, global_iqr, n_sim, and seed. Persist and
reuse it:
calibration.save('scale_calibration.json')
from LZGraphs import ScaleCalibration
calibration = ScaleCalibration.load('scale_calibration.json')
Graph Algebra¶
| Operation | Method | Result |
|---|---|---|
| Union | a.union(b) |
Sum edge counts |
| Intersection | a.intersection(b) |
Shared structure, min counts |
| Difference | a.difference(b) |
Subtract edge counts |
| Weighted Merge | a.weighted_merge(b, α, β) |
Linear combination \( \alpha A + \beta B \) |
All operations return a new FlashBackGraph.
Features & Adjacency¶
feature_stats¶
15-element statistical vector describing the graph, for ML pipelines.feature_mass_profile¶
Position-based mass distribution profile.adjacency_csr¶
CSR (Compressed Sparse Row) adjacency as a dict of numpy arrays (row_offsets, col_indices, weights, counts).
IO¶
save¶
Save to.lzg binary format.
load (classmethod)¶
Load from a.lzg binary file.
Attributes¶
Basic¶
| Attribute | Type | Description |
|---|---|---|
n_nodes |
int |
Total number of nodes (including @ and $ sentinels) |
n_edges |
int |
Total number of directed edges |
n_sequences |
int |
Number of input sequences (abundance-weighted) |
variant |
str |
Always 'flashback' |
is_dag |
bool |
Always True |
path_count |
float |
Exact number of distinct walks (via DP) |
Structure¶
| Attribute | Type | Description |
|---|---|---|
nodes |
list[str] |
Node labels (excluding sentinels) |
all_nodes |
list[str] |
All node labels (including @ and $) |
edges |
list[tuple] |
(src, dst, weight, count) tuples (no sentinels) |
all_edges |
list[tuple] |
All (src, dst, weight, count) tuples |
n_initial |
int |
Number of initial states |
n_terminal |
int |
Number of terminal nodes |
max_out_degree |
int |
Maximum out-degree |
max_in_degree |
int |
Maximum in-degree |
density |
float |
Graph density (0 to 1) |
out_degrees |
np.ndarray |
Out-degree of each node |
in_degrees |
np.ndarray |
In-degree of each node |
length_distribution |
dict |
{length: count} mapping |
Streaming Builder: FlashBackStream¶
For sequences that arrive incrementally (a generator, a network stream, an open-ended simulator), FlashBackStream accumulates them and lets you monitor running node/edge counts before deciding to stop. The finalized graph is byte-identical to FlashBackGraph(all_sequences).
from LZGraphs import FlashBackStream
with FlashBackStream(smoothing=0.0) as stream:
for batch in source:
stream.add_sequences(batch) # appends; cheap
print(stream.n_nodes, stream.n_edges) # peek; instant
if some_stop_condition:
break
graph = stream.finalize() # one-time CSR build
graph.save('foundation.lzg')
| Member | Description |
|---|---|
FlashBackStream(smoothing=0.0) |
Open a new streaming builder. |
add_sequences(sequences, abundances=None) |
Append sequences; raises RuntimeError after finalize/abort. |
n_nodes / n_edges |
Current running counts (instant peek). |
peek() |
{'n_nodes': int, 'n_edges': int} of the running accumulator. |
snapshot() |
Build a finalized graph from the current state without consuming the stream (for checkpointing). |
finalize() |
Convert the accumulator into a FlashBackGraph; consumes the stream. |
abort() |
Discard the partial build without paying the finalize cost (idempotent). |
Used as a context manager, the stream auto-aborts on exit if it was never finalized.