Skip to content

LZGraph

The primary class for building, analyzing, and using LZ76-based repertoire graphs.

Constructor

LZGraph(
    sequences,
    *,
    variant='aap',
    abundances=None,
    v_genes=None,
    j_genes=None,
    smoothing=0.0
)

Parameters

Parameter Type Description
sequences list[str] CDR3 sequences (amino acid or nucleotide)
variant str 'aap', 'ndp', or 'naive' (default: 'aap')
abundances list[int] Optional frequency weights per sequence
v_genes list[str] Optional V gene names
j_genes list[str] Optional J gene names
smoothing float Laplace smoothing alpha (default: 0.0)

Core Methods

lzpgen

lzpgen(sequences, log=True)
Calculate the generation probability of one or more sequences.

  • Parameters: sequences (str or list[str]), log (bool)
  • Returns: float (if single) or np.ndarray (if list)

simulate

simulate(n, *, v_gene=None, j_gene=None, sample_genes=False, seed=None)
Batch-generate n new sequences from the model.

posterior

posterior(sequences, *, abundances=None, kappa=1.0)
Create a personalized posterior graph using this graph as a prior.

  • Returns: A new LZGraph instance.

Diversity & Analytics

effective_diversity

effective_diversity()
Returns \(e^H\) where H is Shannon entropy. Equivalent to Hill number \(D(1)\).

hill_number

hill_number(alpha)
Returns Hill diversity number \(D(\alpha)\).

hill_numbers

hill_numbers(orders)
Returns an array of Hill numbers for multiple orders.

hill_curve

hill_curve(orders=None)
Returns a dict with orders and values for plotting a diversity profile.

pgen_moments

pgen_moments()
Returns a dict with mean, variance, std, skewness, kurtosis of the log-PGEN distribution.

path_entropy_rate

path_entropy_rate(sequences)
Estimated bits per subpattern step from a list of sequences.

Occupancy & Prediction

predicted_richness

predicted_richness(depth)
Expected number of distinct sequences at a given sequencing depth.

predicted_overlap

predicted_overlap(d_i, d_j)
Expected number of shared sequences between two samples of depths \(d_i\) and \(d_j\).

predict_sharing

predict_sharing(draw_counts, max_k=None)
Predict the sharing spectrum across a cohort of donors.

Graph Algebra

Operation Method Result
Union a | b or a.union(b) Sum edge counts
Intersection a & b or a.intersection(b) Shared structure, min counts
Difference a - b or a.difference(b) Subtract edge counts
Weighted Merge a.weighted_merge(b, α, β) Linear combination \( \alpha A + \beta B \)

Features & Adjacency

feature_aligned

feature_aligned(query)
Project a query graph into this graph's node space. Returns a np.ndarray.

feature_stats

feature_stats()
Returns a 15-element statistical vector describing the graph.

feature_mass_profile

feature_mass_profile(max_pos=30)
Position-based mass distribution profile.

adjacency_csr

adjacency_csr()
Returns a dictionary of numpy arrays representing the graph in CSR format.

successors

successors(node_label)
Returns a list of (target, weight, count) for a given node.

IO

save

save(path)
Save to .lzg binary format.

load (classmethod)

LZGraph.load(path)
Load from a .lzg binary file.

Attributes

Basic

Attribute Type Description
n_nodes int Total number of nodes (including sentinels)
n_edges int Total number of edges (including sentinels)
n_sequences int Number of input sequences
variant str 'aap', 'ndp', or 'naive'
is_dag bool True if the graph is a Directed Acyclic Graph
path_count int Combinatorial size (total unique paths)

Structure

Attribute Type Description
nodes list[str] Node labels (excluding sentinels)
all_nodes list[str] All node labels (including @ and $)
edges list[tuple] (src, dst, weight, count) tuples (no sentinels)
n_initial int Number of initial states
n_terminal int Number of terminal nodes
density float Graph density (0 to 1)
out_degrees np.ndarray Out-degree of each node
in_degrees np.ndarray In-degree of each node
length_distribution dict {length: count} mapping

Genes

Attribute Type Description
has_gene_data bool True if gene annotations are available
v_genes list[str] List of V gene names
j_genes list[str] List of J gene names
v_marginals dict V gene marginal distribution
j_marginals dict J gene marginal distribution
vj_distribution list Joint VJ distribution