Skip to content

NaiveLZGraph

Non-positional LZGraph for consistent feature extraction and cross-repertoire analysis.

Quick Example

from LZGraphs import NaiveLZGraph
from LZGraphs.utilities import generate_kmer_dictionary

# Create shared dictionary
dictionary = generate_kmer_dictionary(6)

# Build graph
sequences = ['TGTGCCAGCAGT', 'TGTGCCAGCAGC']
graph = NaiveLZGraph(sequences, dictionary, verbose=True)

# Extract features
features = graph.eigenvector_centrality()

Class Reference

NaiveLZGraph

NaiveLZGraph(cdr3_list, dictionary, verbose=True, smoothing_alpha=0.0)

Bases: LZGraphBase

This class implements the logic and infrastructure of the "Naive" version of the LZGraph The nodes of this graph are LZ sub-patterns alone without any other additions, This class best fits when the objective is extracting features from a repertoire.

...

Methods

walk_probability(walk,verbose=True): returns the PGEN of the given walk (list of sub-patterns)

random_walk(steps): given a number of steps (sub-patterns) returns a random walk on the graph between a random inital state to a random terminal state in the given number of steps

random_walk_ber_shortest(steps, sfunc_h=0.6, sfunc_k=12): given a number of steps (sub-patterns) returns a random walk on the graph between a random inital state to a random terminal state, the closer the walk is to the number of selected steps, the higher the probability that the next state will be selected using the shortest-path via dijkstra algorithm. the saturation function which controls the probability of the selecting a node base on the shortest path from the current state is given by the hill function that has 2 parameters, "h" and "h", and can be changed by passing value for the "sfunc_h" parameter and the "sfunc_k" parameter.

unsupervised_random_walk(): a random initial state and a random terminal state are selected and a random unsupervised walk is carried out until the randomly selected terminal state is reached.

eigenvector_centrality(): return the eigen vector centrality value for each node (this function is used as the feature extractor for the LZGraph)

sequence_variation_curve(cdr3_sample): given a cdr3 sequence, the function will calculate the value of the variation curve and return 2 arrays, 1 of the sub-patterns and 1 for the number of out neighbours for each sub-pattern

graph_summary(): the function will return a pandas DataFrame containing the graphs Chromatic Number,Number of Isolates,Max In Deg,Max Out Deg,Number of Edges

Attributres
  nodes:
      returns the nodes of the graph
  edges:
      return the edges of the graph

in order to derive the dictionary you can use the heleper function "generate_dictionary" :param cdr3_list: a list of nucleotide sequence :param dictionary: a list of strings, where each string is a sub-pattern that will be converted into a node :param verbose: :param smoothing_alpha: Laplace smoothing parameter for edge weights. 0.0 means no smoothing.

clean_node staticmethod

clean_node(node: str) -> str

Return the clean subpattern from a node.

For NaiveLZGraph, nodes are already just the raw LZ subpatterns without any position information, so this returns the node unchanged.

PARAMETER DESCRIPTION
node

A node identifier (LZ subpattern).

TYPE: str

RETURNS DESCRIPTION
str

The same subpattern (no transformation needed).

TYPE: str

Constructor

Parameters

Parameter Type Description
sequences list[str] List of sequences
dictionary list[str] List of allowed patterns
verbose bool Print progress (default: True)

Key Differences

Unlike AAPLZGraph and NDPLZGraph:

  • No positional encoding - Nodes are just patterns
  • Fixed dictionary - Consistent nodes across repertoires
  • No gene support - No V/J annotation

Primary Use Cases

Machine Learning Features

from LZGraphs import NaiveLZGraph
from LZGraphs.utilities import generate_kmer_dictionary

# Shared dictionary for all repertoires
dictionary = generate_kmer_dictionary(6)

# Build graphs for multiple repertoires
graphs = []
for sequences in repertoire_list:
    g = NaiveLZGraph(sequences, dictionary, verbose=False)
    graphs.append(g)

# Extract feature vectors (same dimensions!)
features = [g.eigenvector_centrality() for g in graphs]

Cross-Repertoire Comparison

# Same dictionary ensures comparable graphs
g1 = NaiveLZGraph(seqs1, dictionary)
g2 = NaiveLZGraph(seqs2, dictionary)

# Features are directly comparable
f1 = g1.eigenvector_centrality()
f2 = g2.eigenvector_centrality()

Dictionary Generation

from LZGraphs.utilities import generate_kmer_dictionary

# All patterns up to length k
dict_6 = generate_kmer_dictionary(6)  # 5460 patterns
dict_5 = generate_kmer_dictionary(5)  # 1364 patterns
dict_4 = generate_kmer_dictionary(4)  # 340 patterns

print(f"Length 6: {len(dict_6)} patterns")

See Also