NDPLZGraph Class

`NDPLZGraph`

Bases: LZGraphBase

This class implements the logic and infrastructure of the "Nucleotide Double Positional" version of the LZGraph The nodes of this graph are LZ sub-patterns with added reading frame start position and the start position in the sequence, formally: {lz_subpattern}{reading frame start}_{start position in sequence}, This class best fits analysis and inference of nucleotide sequences.

...

Args:

walk_probability(walk,verbose=True): returns the PGEN of the given walk (list of sub-patterns)

is_dag(): the function checks whether the graph is a Directed acyclic graph

walk_genes(walk,dropna=True): give a walk on the graph (a list of nodes) the function will return a table representing the possible genes and their probabilities at each edge of the walk.

path_gene_table(cdr3_sample,threshold=None): the function will return two tables of all possible v and j genes that colud be used to generate the sequence given by "cdr3_sample"

path_gene_table_plot(threshold=None,figsize=None): the function plots two heatmap, one for V genes and one for J genes, and represents the probability at each edge to select that gene, the color at each cell is equal to the probability of selecting the gene, a black cell means that the graph didn't see that gene used with that sub-pattern.

the data used to create the charts can be derived by using the "path_gene_table" method.

gene_variation(cdr3): given a sequence, this will derive a charts that shows the number of V and J genes observed per node (LZ- subpattern).

gene_variation_plot(cdr3): Plots the data derived at the "gene_variation" method as two bar charts overlayed, one for V gene count and one for J gene count.

random_walk(steps): given a number of steps (sub-patterns) returns a random walk on the graph between a random inital state to a random terminal state in the given number of steps

gene_random_walk(seq_len, initial_state): given a target sequence length and an initial state, the function will select a random V and a random J genes from the observed gene frequency in the graph's "Training data" and generate a walk on the graph from the initial state to a terminal state while making sure at each step that both the selected V and J genes were seen used by that specific sub-pattern.

unsupervised_random_walk(): a random initial state and a random terminal state are selected and a random unsupervised walk is carried out until the randomly selected terminal state is reached.

eigenvector_centrality(): return the eigen vector centrality value for each node (this function is used as the feature extractor for the LZGraph)

sequence_variation_curve(cdr3_sample): given a cdr3 sequence, the function will calculate the value of the variation curve and return 2 arrays, 1 of the sub-patterns and 1 for the number of out neighbours for each sub-pattern

graph_summary(): the function will return a pandas DataFrame containing the graphs Chromatic Number,Number of Isolates,Max In Deg,Max Out Deg,Number of Edges

Attributes:

    nodes:
        returns the nodes of the graph
    edges:
        return the edges of the graph