FlashBackGraph Quickstart¶
Applies to: FlashBackGraph
In five minutes you will build a FlashBackGraph from a list of CDR3 sequences, read an exact diversity number off it, and score a sequence for how surprising it is. FlashBackGraph is the family to use when you want exact, sampling-free analytics or anomaly detection. If you also need V/J gene modeling or ML features, see the LZGraph Quickstart instead.
Step 1: Build a graph¶
from LZGraphs import FlashBackGraph
sequences = ['CASSLEPSGGTDTQYF', 'CASSDTSGGTDTQYF', 'CASSLAPGATNEKLFF']
fb = FlashBackGraph(sequences)
You can also build from a file with constant memory:
Step 2: Measure diversity, exactly¶
Because the graph is strictly Markovian, these numbers are computed exactly by forward dynamic programming. There is no sampling and no variance to worry about: the same graph always gives the same answer.
fb.effective_diversity() # exp(Shannon entropy), the same as Hill D(1)
fb.hill_number(2) # inverse Simpson diversity
fb.hill_numbers([0, 1, 2]) # several orders at once (numpy array)
Biologically, a higher effective diversity means the repertoire spreads its probability mass over more distinct sequences; a low number means a few clonotypes dominate.
Step 3: Score a sequence's generation probability¶
Step 4: Detect anomalies with SCALE¶
SCALE answers "how surprising is this sequence given the repertoire?" You calibrate once against the graph, then score (higher = more anomalous).
cal = fb.calibrate_scale(seed=0) # self-calibrate (simulate from the graph)
fb.scale_score('CASSLEPSGGTDTQYF', cal) # typical -> near 0
fb.scale_score('KKKKWWWWPPPP', cal) # anomalous -> large positive
See the anomaly-detection tutorial for choosing a flag threshold.
Step 5: Simulate new sequences¶
Complete example¶
from LZGraphs import FlashBackGraph
fb = FlashBackGraph(['CASSLEPSGGTDTQYF', 'CASSDTSGGTDTQYF', 'CASSLAPGATNEKLFF'])
print(f"Effective diversity: {fb.effective_diversity():.2f}")
print(f"Inverse Simpson D(2): {fb.hill_number(2):.2f}")
print(f"log pgen: {fb.pgen('CASSLEPSGGTDTQYF'):.3f}")
cal = fb.calibrate_scale(seed=0)
print(f"SCALE: {fb.scale_score('CASSLEPSGGTDTQYF', cal):.3f}")