LZGraphs¶
LZGraphs is a Python library for analyzing T-cell receptor (TCR) repertoires using Lempel-Ziv 76 compression-based graph representations. It provides a novel approach to sequence analysis that doesn't rely on alignment or genotype references.
Why LZGraphs?¶
Traditional TCR repertoire analysis methods often struggle with:
- Alignment dependencies - requiring reference sequences
- Computational complexity - O(n²) pairwise comparisons
- Loss of positional information - treating sequences as bags of k-mers
LZGraphs solves these problems by encoding sequences as walks through directed graphs, capturing both the content and structure of repertoires in a computationally efficient way.
Key Features¶
Graph Representations¶
Three specialized graph types for different analysis needs: AAPLZGraph for amino acids, NDPLZGraph for nucleotides, and NaiveLZGraph for general sequences.
Diversity Metrics¶
Novel diversity indices including K1000 and LZCentrality that capture repertoire complexity through graph topology.
Gene Analysis¶
Built-in V/J gene annotation support for genomic-aware sequence generation and gene usage analysis.
Visualization¶
Publication-ready plots for sequence analysis, including path variability, genomic heatmaps, and saturation curves.
Quick Start¶
Installation¶
Requirements: Python 3.9 or higher
Your First Graph¶
import pandas as pd
from LZGraphs import AAPLZGraph
# Load your TCR repertoire data
data = pd.DataFrame({
'cdr3_amino_acid': ['CASSLEPSGGTDTQYF', 'CASSDTSGGTDTQYF', 'CASSLEPQTFTDTFFF'],
'V': ['TRBV16-1*01', 'TRBV1-1*01', 'TRBV16-1*01'],
'J': ['TRBJ1-2*01', 'TRBJ1-5*01', 'TRBJ2-7*01']
})
# Build the graph
graph = AAPLZGraph(data, verbose=True)
# Calculate sequence probability
sequence = "CASSLEPSGGTDTQYF"
pgen = graph.walk_probability(AAPLZGraph.encode_sequence(sequence))
print(f"P(gen) = {pgen:.2e}")
Documentation Overview¶
Getting Started New to LZGraphs? Start here for installation and basic usage.
Tutorials Step-by-step guides for common analysis tasks.
Concepts Understand the theory behind LZGraphs.
How-To Guides Task-oriented guides for specific operations.
Examples Interactive Jupyter notebooks with real data.
API Reference Complete reference for all classes and functions.
Citation¶
If you use LZGraphs in your research, please cite our paper:
@article{lzgraphs2024,
title={LZGraphs: A Novel Approach for T-Cell Receptor Repertoire Analysis},
author={Konstantinovsky, Thomas and others},
journal={...},
year={2024}
}
See the Citation page for more details.