Concepts¶

Understanding the theory behind LZGraphs will help you use it more effectively and interpret results correctly.

Core Concepts¶

How Lempel-Ziv compression creates sequence encodings

Comparison of AAPLZGraph, NDPLZGraph, and NaiveLZGraph

How LZGraphs calculates sequence generation probabilities

LZGraphs represents a TCR repertoire as a directed graph where:

Sequences become walks - Each CDR3 sequence is a path through the graph
Patterns become nodes - Subpatterns from LZ76 decomposition are nodes
Transitions become edges - Observed pattern transitions are edges
Frequencies become weights - How often transitions occur determines edge weights

This representation enables:

Traditional approaches to repertoire analysis face challenges:

Challenge	Traditional Approach	LZGraphs Approach
Comparing sequences	Pairwise alignment (O(n²))	Walk probability (O(n))
Finding patterns	K-mer counting	Graph structure
Generating sequences	Statistical models	Random walks
Cross-repertoire comparison	Sequence overlap	Graph divergence

The positional encoding in AAPLZGraph and NDPLZGraph captures that:

The graph captures:

Edge weights encode:

Dive deeper into specific concepts: