Skip to content

LZGraphs

PyPI version Python 3.9+ License: MIT Documentation

LZGraphs Logo

LZGraphs is a Python library for analyzing T-cell receptor (TCR) repertoires using Lempel-Ziv 76 compression-based graph representations. It provides a novel approach to sequence analysis that doesn't rely on alignment or genotype references.


Why LZGraphs?

Traditional TCR repertoire analysis methods often struggle with:

  • Alignment dependencies - requiring reference sequences
  • Computational complexity - O(n²) pairwise comparisons
  • Loss of positional information - treating sequences as bags of k-mers

LZGraphs solves these problems by encoding sequences as walks through directed graphs, capturing both the content and structure of repertoires in a computationally efficient way.


Key Features

Graph Representations

Three specialized graph types for different analysis needs: AAPLZGraph for amino acids, NDPLZGraph for nucleotides, and NaiveLZGraph for general sequences.

Diversity Metrics

Novel diversity indices including K1000 and LZCentrality that capture repertoire complexity through graph topology.

Gene Analysis

Built-in V/J gene annotation support for genomic-aware sequence generation and gene usage analysis.

Visualization

Publication-ready plots for sequence analysis, including path variability, genomic heatmaps, and saturation curves.


Quick Start

Installation

pip install LZGraphs

Requirements: Python 3.9 or higher

Your First Graph

import pandas as pd
from LZGraphs import AAPLZGraph

# Load your TCR repertoire data
data = pd.DataFrame({
    'cdr3_amino_acid': ['CASSLEPSGGTDTQYF', 'CASSDTSGGTDTQYF', 'CASSLEPQTFTDTFFF'],
    'V': ['TRBV16-1*01', 'TRBV1-1*01', 'TRBV16-1*01'],
    'J': ['TRBJ1-2*01', 'TRBJ1-5*01', 'TRBJ2-7*01']
})

# Build the graph
graph = AAPLZGraph(data, verbose=True)

# Calculate sequence probability
sequence = "CASSLEPSGGTDTQYF"
pgen = graph.walk_probability(AAPLZGraph.encode_sequence(sequence))
print(f"P(gen) = {pgen:.2e}")

Documentation Overview


Citation

If you use LZGraphs in your research, please cite our paper:

@article{lzgraphs2024,
  title={LZGraphs: A Novel Approach for T-Cell Receptor Repertoire Analysis},
  author={Konstantinovsky, Thomas and others},
  journal={...},
  year={2024}
}

See the Citation page for more details.


Connect With Us