V(D)J recombination, from bases up
Before we touch the API, let's watch a single human IGH heavy-chain sequence get assembled in front of us — segment by segment, base by base.
The intuition
A receptor isn't sampled from a vocabulary — it's built. One V allele, then a few random nucleotides, then a D, then more random nucleotides, then a J. Trimming chews bases off each end. The result is unique to that B cell.
Every BCR/TCR begins with the same recipe: pick gene segments from a germline pool, trim their ends with exonuclease, and stitch them together with a few non-templated bases (the NP regions). GenAIRR mirrors this process exactly — when it tells you a base belongs to V, it's because GenAIRR placed it there.
Watch one being assembled
Each row below shows a step. The bottom row is the final assembled sequence — the persistent IR remembers which segment every base came from, so any downstream query about provenance already knows the answer by construction.
What this looks like in code
Reproducing the assembly above is a single fluent chain. .recombine() runs the V(D)J ceremony; .run_records() hands you the records as plain dictionaries you can index by AIRR-schema field name.
import GenAIRR as ga
result = (
ga.Experiment.on("human_igh")
.recombine()
.run_records(n=1, seed=42)
)
rec = result.records[0]
print(rec["v_call"]) # 'IGHVF10-G38*04'
print(rec["d_call"]) # 'IGHD2-15*01'
print(rec["j_call"]) # 'IGHJ2*01'
print(rec["v_sequence_start"],
rec["v_sequence_end"]) # 0 296
print(rec["np1_length"]) # 13
print(rec["sequence_length"]) # 388
Paste the snippet into a Python REPL. Then change seed=42 to seed=7: the V allele changes, NP regions change length, but the structure — V, NP1, D, NP2, J — stays the same. That's the rearrangement skeleton you'll learn to layer biology on top of.
Generate n=10 sequences with seed=42. Confirm d_call is non-empty for every record (heavy chains always have a D). Then run the same with "human_igk" — what's d_call now, and why?