Module Functions¶
Standalone functions that operate on graphs or sequences. Import them from the top-level package:
from LZGraphs import (
jensen_shannon_divergence,
k_diversity,
saturation_curve,
lz76_decompose,
set_log_level,
set_log_callback,
)
Repertoire Comparison¶
jensen_shannon_divergence¶
Compute the Jensen-Shannon Divergence between two LZGraphs. JSD is a symmetric, bounded measure of distributional divergence over the shared subpattern space.
Parameters:
| Parameter | Type | Description |
|---|---|---|
graph_a |
LZGraph |
First graph |
graph_b |
LZGraph |
Second graph (must be the same variant) |
Returns: float — JSD value in \([0, 1]\). 0 = identical distributions, 1 = maximally different.
Example:
from LZGraphs import LZGraph, jensen_shannon_divergence
g1 = LZGraph(seqs_healthy, variant='aap')
g2 = LZGraph(seqs_disease, variant='aap')
jsd = jensen_shannon_divergence(g1, g2)
print(f"JSD = {jsd:.4f}")
# 0.00-0.05: nearly identical
# 0.05-0.15: very similar
# 0.15-0.30: moderately different
# 0.30+: substantially different
Diversity Analysis¶
k_diversity¶
Subsample-based diversity metric. Repeatedly draws \(k\) sequences, builds an LZGraph, counts unique subpattern nodes, and reports statistics across draws resampling rounds.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
sequences |
list[str] |
— | Input CDR3 sequences |
k |
int |
— | Subsample size |
variant |
str |
'aap' |
Graph encoding variant |
draws |
int |
100 |
Number of resampling rounds |
seed |
int or None |
None |
RNG seed for reproducibility (-1 or None = random) |
Returns: dict with keys:
| Key | Type | Description |
|---|---|---|
'mean' |
float |
Mean number of unique nodes across draws |
'std' |
float |
Standard deviation |
'ci_low' |
float |
Lower bound of 95% confidence interval |
'ci_high' |
float |
Upper bound of 95% confidence interval |
Example:
from LZGraphs import k_diversity
result = k_diversity(sequences, k=1000, variant='aap', draws=100)
print(f"K(1000) = {result['mean']:.1f} +/- {result['std']:.1f}")
print(f"95% CI: [{result['ci_low']:.1f}, {result['ci_high']:.1f}]")
Choosing k
Pick \(k\) well below your smallest repertoire so all samples can be compared on the same footing. Common choices: 500, 1000, 5000.
saturation_curve¶
Compute the node/edge saturation curve: add sequences one at a time and record how the graph grows.
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
sequences |
list[str] |
— | Input sequences (order matters) |
variant |
str |
'aap' |
Graph encoding variant |
log_every |
int |
100 |
Record a data point every N sequences |
Returns: list[dict], where each dict has:
| Key | Type | Description |
|---|---|---|
'n_sequences' |
int |
Number of sequences added so far |
'n_nodes' |
int |
Total nodes in the graph at this point |
'n_edges' |
int |
Total edges in the graph at this point |
Example:
from LZGraphs import saturation_curve
curve = saturation_curve(sequences, variant='aap', log_every=500)
for point in curve[:5]:
print(f"After {point['n_sequences']:5d} seqs: "
f"{point['n_nodes']:5d} nodes, {point['n_edges']:5d} edges")
LZ76 Utilities¶
lz76_decompose¶
Decompose a string into its LZ76 subpatterns. This is the raw decomposition without positional encoding — the same algorithm used internally by all graph variants.
Parameters:
| Parameter | Type | Description |
|---|---|---|
sequence |
str |
Input string (amino acid, nucleotide, or any characters) |
Returns: list[str] — the LZ76 subpatterns.
Example:
from LZGraphs import lz76_decompose
tokens = lz76_decompose("CASSLEPSGGTDTQYF")
print(tokens)
# ['C', 'A', 'S', 'SL', 'E', 'P', 'SG', 'G', 'T', 'D', 'TQ', 'Y', 'F']
# The number of tokens measures the sequence's LZ76 complexity
print(f"Complexity: {len(tokens)} tokens for {len('CASSLEPSGGTDTQYF')} chars")
How LZ76 works
At each step, the algorithm finds the shortest substring that hasn't appeared before. C is new, A is new, S is new. Then S is already known, so we extend to SL (new). This greedy process continues until the string is consumed. See LZ76 Algorithm for a detailed explanation.
Configuration¶
set_log_level¶
Enable or disable logging from the C backend. Messages are written to stderr.
Parameters:
| Parameter | Type | Description |
|---|---|---|
level |
str |
One of: 'none', 'error', 'warn', 'info', 'debug', 'trace' |
Example:
import LZGraphs
LZGraphs.set_log_level('info') # See build progress and timing
graph = LZGraph(sequences, variant='aap')
LZGraphs.set_log_level('none') # Silence all output (default)
set_log_callback¶
Route log messages to a custom Python callback instead of stderr. Useful for integrating with Python's logging module.
Parameters:
| Parameter | Type | Description |
|---|---|---|
callback |
callable or None |
Function (level: int, message: str) -> None. Pass None to disable. |
level |
str |
Maximum level to emit (default: 'info') |
Level values passed to the callback: 1=error, 2=warn, 3=info, 4=debug, 5=trace.
Example:
import logging
import LZGraphs
logger = logging.getLogger('lzgraphs')
LEVEL_MAP = {1: logging.ERROR, 2: logging.WARNING, 3: logging.INFO,
4: logging.DEBUG, 5: logging.DEBUG}
LZGraphs.set_log_callback(
lambda lvl, msg: logger.log(LEVEL_MAP.get(lvl, logging.DEBUG), msg),
level='info',
)
See Also¶
- LZGraph class — the main class with all instance methods
- CLI Reference — command-line equivalents
- Diversity Metrics tutorial — using k_diversity and JSD in practice