NaiveLZGraph¶
Non-positional LZGraph for consistent feature extraction and cross-repertoire analysis.
Quick Example¶
from LZGraphs import NaiveLZGraph
from LZGraphs.utilities import generate_kmer_dictionary
# Create shared dictionary
dictionary = generate_kmer_dictionary(6)
# Build graph
sequences = ['TGTGCCAGCAGT', 'TGTGCCAGCAGC']
graph = NaiveLZGraph(sequences, dictionary, verbose=True)
# Extract features
features = graph.eigenvector_centrality()
Class Reference¶
NaiveLZGraph
¶
Bases: LZGraphBase
This class implements the logic and infrastructure of the "Naive" version of the LZGraph The nodes of this graph are LZ sub-patterns alone without any other additions, This class best fits when the objective is extracting features from a repertoire.
...
Methods¶
walk_probability(walk,verbose=True): returns the PGEN of the given walk (list of sub-patterns)
random_walk(steps): given a number of steps (sub-patterns) returns a random walk on the graph between a random inital state to a random terminal state in the given number of steps
random_walk_ber_shortest(steps, sfunc_h=0.6, sfunc_k=12): given a number of steps (sub-patterns) returns a random walk on the graph between a random inital state to a random terminal state, the closer the walk is to the number of selected steps, the higher the probability that the next state will be selected using the shortest-path via dijkstra algorithm. the saturation function which controls the probability of the selecting a node base on the shortest path from the current state is given by the hill function that has 2 parameters, "h" and "h", and can be changed by passing value for the "sfunc_h" parameter and the "sfunc_k" parameter.
unsupervised_random_walk(): a random initial state and a random terminal state are selected and a random unsupervised walk is carried out until the randomly selected terminal state is reached.
eigenvector_centrality(): return the eigen vector centrality value for each node (this function is used as the feature extractor for the LZGraph)
sequence_variation_curve(cdr3_sample): given a cdr3 sequence, the function will calculate the value of the variation curve and return 2 arrays, 1 of the sub-patterns and 1 for the number of out neighbours for each sub-pattern
graph_summary(): the function will return a pandas DataFrame containing the graphs Chromatic Number,Number of Isolates,Max In Deg,Max Out Deg,Number of Edges
Attributres¶
nodes:
returns the nodes of the graph
edges:
return the edges of the graph
in order to derive the dictionary you can use the heleper function "generate_dictionary" :param cdr3_list: a list of nucleotide sequence :param dictionary: a list of strings, where each string is a sub-pattern that will be converted into a node :param verbose: :param smoothing_alpha: Laplace smoothing parameter for edge weights. 0.0 means no smoothing.
clean_node
staticmethod
¶
Return the clean subpattern from a node.
For NaiveLZGraph, nodes are already just the raw LZ subpatterns without any position information, so this returns the node unchanged.
| PARAMETER | DESCRIPTION |
|---|---|
node
|
A node identifier (LZ subpattern).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
The same subpattern (no transformation needed).
TYPE:
|
Constructor¶
Parameters¶
| Parameter | Type | Description |
|---|---|---|
sequences |
list[str] |
List of sequences |
dictionary |
list[str] |
List of allowed patterns |
verbose |
bool |
Print progress (default: True) |
Key Differences¶
Unlike AAPLZGraph and NDPLZGraph:
- No positional encoding - Nodes are just patterns
- Fixed dictionary - Consistent nodes across repertoires
- No gene support - No V/J annotation
Primary Use Cases¶
Machine Learning Features¶
from LZGraphs import NaiveLZGraph
from LZGraphs.utilities import generate_kmer_dictionary
# Shared dictionary for all repertoires
dictionary = generate_kmer_dictionary(6)
# Build graphs for multiple repertoires
graphs = []
for sequences in repertoire_list:
g = NaiveLZGraph(sequences, dictionary, verbose=False)
graphs.append(g)
# Extract feature vectors (same dimensions!)
features = [g.eigenvector_centrality() for g in graphs]
Cross-Repertoire Comparison¶
# Same dictionary ensures comparable graphs
g1 = NaiveLZGraph(seqs1, dictionary)
g2 = NaiveLZGraph(seqs2, dictionary)
# Features are directly comparable
f1 = g1.eigenvector_centrality()
f2 = g2.eigenvector_centrality()
Dictionary Generation¶
from LZGraphs.utilities import generate_kmer_dictionary
# All patterns up to length k
dict_6 = generate_kmer_dictionary(6) # 5460 patterns
dict_5 = generate_kmer_dictionary(5) # 1364 patterns
dict_4 = generate_kmer_dictionary(4) # 340 patterns
print(f"Length 6: {len(dict_6)} patterns")
See Also¶
- AAPLZGraph - Positional amino acid version
- NDPLZGraph - Positional nucleotide version
- Concepts: Graph Types