Skip to content

Visualization

This tutorial covers creating publication-ready plots for TCR repertoire analysis.

Overview

LZGraphs provides specialized visualization functions:

Function Purpose
draw_graph Visualize graph structure
ancestors_descendants_curves_plot Trace sequence path through graph
sequence_possible_paths_plot Show branching at each position
sequence_genomic_node_variability_plot V/J gene diversity per node
sequence_genomic_edges_variability_plot V/J gene diversity per edge

Setup

import pandas as pd
from LZGraphs import AAPLZGraph
from LZGraphs.visualization import (
    draw_graph,
    ancestors_descendants_curves_plot,
    sequence_possible_paths_plot,
    sequence_genomic_node_variability_plot,
    sequence_genomic_edges_variability_plot
)

# Build a graph
data = pd.read_csv("Examples/ExampleData1.csv")
graph = AAPLZGraph(data, verbose=False)

Drawing the Graph

Visualize the graph structure:

draw_graph(graph, file_name='my_lzgraph.png')

This generates a PNG image showing the graph structure with nodes representing LZ76 patterns and edges showing observed transitions.

Large graphs

For large repertoires, the graph may be too complex to visualize effectively. Consider filtering to a subset of nodes.


Ancestors and Descendants Curves

This plot shows how the number of ancestors (predecessors) and descendants (successors) changes along a sequence path.

sequence = 'CASTPGTASGYTF'
ancestors_descendants_curves_plot(graph, sequence)

Ancestors Descendants Curve

Interpretation

  • Descendants curve (blue): Number of reachable nodes from each position
  • Ancestors curve (orange): Number of paths leading to each position
  • Intersection point: Where the sequence transitions from "common start" to "specific ending"

Use Cases

  • Compare rare vs. common sequences
  • Identify motifs that constrain downstream options
  • Study sequence "funneling" patterns

Sequence Possible Paths

Shows the number of alternative paths (branching factor) at each position:

sequence = 'CASTPGTASGYTF'
sequence_possible_paths_plot(graph, sequence)

Possible Paths Plot

Interpretation

  • High values: Many alternatives at that position (common patterns)
  • Low values: Few alternatives (rare patterns)
  • Value of 1: Only one observed continuation

Correlation with Rarity

Sequences with consistently low path counts are rare in the repertoire and tend to have: - Lower generation probability - Higher Levenshtein distance from repertoire mean - Lower LZCentrality


Genomic Node Variability

Shows V and J gene diversity at each node in a sequence:

sequence = 'CASTPGTASGYTF'
sequence_genomic_node_variability_plot(graph, sequence)

Node Variability Plot

Interpretation

  • Bar height: Number of distinct V/J genes observed at that node
  • High V diversity early: Expected for V-gene derived regions
  • High J diversity late: Expected for J-gene derived regions

Requirements

This function requires gene annotation data (V and J columns) in your original DataFrame.


Genomic Edge Variability

Shows V and J gene associations for each edge transition:

sequence = 'CASTPGTASGYTF'
sequence_genomic_edges_variability_plot(graph, sequence)

Edge Variability Plot

Reading the Heatmap

  • Rows: Gene names (V or J)
  • Columns: Edge transitions
  • Color intensity: Probability of that edge given the gene
  • Red gene names: Gene appears in ALL edges
  • Black cells: Gene not observed at that edge

Use Cases

  • Identify gene-specific sequence motifs
  • Compare gene usage between sequences
  • Study CDR3 structure by gene

Customizing Plots

Saving Figures

import matplotlib.pyplot as plt

# Create the plot
fig = sequence_possible_paths_plot(graph, sequence)

# Customize and save
plt.title("Path Variability Analysis")
plt.tight_layout()
plt.savefig("my_analysis.png", dpi=300, bbox_inches='tight')
plt.close()

Batch Processing

sequences = [
    'CASTPGTASGYTF',
    'CASSLEPSGGTDTQYF',
    'CASSLGQGSTEAFF'
]

for i, seq in enumerate(sequences):
    ancestors_descendants_curves_plot(graph, seq)
    plt.savefig(f"ad_curve_{i}.png", dpi=150)
    plt.close()

Comparing Sequences

Visualize differences between sequences:

import matplotlib.pyplot as plt

sequences = ['CASTPGTASGYTF', 'CASSLEPSGGTDTQYF']

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

for ax, seq in zip(axes, sequences):
    plt.sca(ax)
    sequence_possible_paths_plot(graph, seq)
    ax.set_title(seq)

plt.tight_layout()
plt.savefig("comparison.png", dpi=300)

Saturation Curves

Visualize how diversity grows with sample size:

from LZGraphs import NodeEdgeSaturationProbe
import matplotlib.pyplot as plt

sequences = data['cdr3_amino_acid'].tolist()
probe = NodeEdgeSaturationProbe()

# Generate curve
curve = probe.saturation_curve(
    sequences,
    encoding_function=AAPLZGraph.encode_sequence,
    steps=50
)

# Plot
plt.figure(figsize=(10, 6))
plt.plot(curve['sequences'], curve['nodes'], label='Nodes')
plt.plot(curve['sequences'], curve['edges'], label='Edges')
plt.xlabel('Number of Sequences')
plt.ylabel('Count')
plt.title('Node/Edge Saturation Curve')
plt.legend()
plt.grid(True, alpha=0.3)
plt.savefig("saturation_curve.png", dpi=300)

Complete Example

import pandas as pd
import matplotlib.pyplot as plt
from LZGraphs import AAPLZGraph
from LZGraphs.visualization import (
    ancestors_descendants_curves_plot,
    sequence_possible_paths_plot,
    sequence_genomic_node_variability_plot
)

# Load and build
data = pd.read_csv("Examples/ExampleData1.csv")
graph = AAPLZGraph(data, verbose=False)

# Analyze a sequence
sequence = 'CASTPGTASGYTF'

# Create multi-panel figure
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Panel 1: Ancestors/Descendants
plt.sca(axes[0])
ancestors_descendants_curves_plot(graph, sequence)
axes[0].set_title("Ancestors & Descendants")

# Panel 2: Possible Paths
plt.sca(axes[1])
sequence_possible_paths_plot(graph, sequence)
axes[1].set_title("Path Variability")

# Panel 3: Gene Variability
plt.sca(axes[2])
sequence_genomic_node_variability_plot(graph, sequence)
axes[2].set_title("V/J Gene Diversity")

plt.suptitle(f"Analysis of {sequence}", fontsize=14)
plt.tight_layout()
plt.savefig("complete_analysis.png", dpi=300)
plt.show()

Next Steps