Concepts¶
Understanding the theory behind LZGraphs will help you use it more effectively and interpret results correctly.
Core Concepts¶
LZ76 Algorithm¶
How Lempel-Ziv compression creates sequence encodings
Graph Types¶
Comparison of AAPLZGraph, NDPLZGraph, and NaiveLZGraph
Probability Model¶
How LZGraphs calculates sequence generation probabilities
The Big Picture¶
LZGraphs represents a TCR repertoire as a directed graph where:
- Sequences become walks - Each CDR3 sequence is a path through the graph
- Patterns become nodes - Subpatterns from LZ76 decomposition are nodes
- Transitions become edges - Observed pattern transitions are edges
- Frequencies become weights - How often transitions occur determines edge weights
This representation enables:
- Efficient probability calculation - O(n) instead of O(n²)
- Pattern discovery - Find common motifs and rare variations
- Sequence generation - Sample new sequences with realistic statistics
- Diversity quantification - Measure complexity through graph topology
Why Graphs?¶
Traditional approaches to repertoire analysis face challenges:
| Challenge | Traditional Approach | LZGraphs Approach |
|---|---|---|
| Comparing sequences | Pairwise alignment (O(n²)) | Walk probability (O(n)) |
| Finding patterns | K-mer counting | Graph structure |
| Generating sequences | Statistical models | Random walks |
| Cross-repertoire comparison | Sequence overlap | Graph divergence |
Key Insights¶
1. Position Matters¶
The positional encoding in AAPLZGraph and NDPLZGraph captures that:
- The same amino acid at position 3 vs position 10 has different meaning
- CDR3 structure follows positional constraints
- V/J gene contributions vary by position
2. Context Matters¶
The graph captures:
- Which patterns can follow which
- Gene-specific transition preferences
- Repertoire-specific motifs
3. Frequency Matters¶
Edge weights encode:
- Common vs rare transitions
- Probability of sequence generation
- Deviation from expected patterns
Next Steps¶
Dive deeper into specific concepts:
- LZ76 Algorithm - Understand the encoding
- Graph Types - Choose the right representation
- Probability Model - Calculate sequence likelihood