Learn Lesson 01
Lesson 01 · ~3 min

V(D)J recombination, from bases up

Before we touch the API, let's watch a single human IGH heavy-chain sequence get assembled in front of us — segment by segment, base by base.

01 / 05

The intuition

A receptor isn't sampled from a vocabulary — it's built. One V allele, then a few random nucleotides, then a D, then more random nucleotides, then a J. Trimming chews bases off each end. The result is unique to that B cell.

Every BCR/TCR begins with the same recipe: pick gene segments from a germline pool, trim their ends with exonuclease, and stitch them together with a few non-templated bases (the NP regions). GenAIRR mirrors this process exactly — when it tells you a base belongs to V, it's because GenAIRR placed it there.

Watch one being assembled

Each row below shows a step. The bottom row is the final assembled sequence — the persistent IR remembers which segment every base came from, so any downstream query about provenance already knows the answer by construction.

Experiment.on("human_igh") · seed=42 live
V gene
+ NP1
+ D gene
+ NP2
+ J gene
final
running… V NP D J

What this looks like in code

Reproducing the assembly above is a single fluent chain. .recombine() runs the V(D)J ceremony; .run_records() hands you the records as plain dictionaries you can index by AIRR-schema field name.

lesson_1.py
import GenAIRR as ga

result = (
    ga.Experiment.on("human_igh")
       .recombine()
       .run_records(n=1, seed=42)
)
rec = result.records[0]

print(rec["v_call"])             # 'IGHVF10-G38*04'
print(rec["d_call"])             # 'IGHD2-15*01'
print(rec["j_call"])             # 'IGHJ2*01'
print(rec["v_sequence_start"],
      rec["v_sequence_end"])    # 0 296
print(rec["np1_length"])         # 13
print(rec["sequence_length"])    # 388
Try it now

Paste the snippet into a Python REPL. Then change seed=42 to seed=7: the V allele changes, NP regions change length, but the structure — V, NP1, D, NP2, J — stays the same. That's the rearrangement skeleton you'll learn to layer biology on top of.

Exercise

Generate n=10 sequences with seed=42. Confirm d_call is non-empty for every record (heavy chains always have a D). Then run the same with "human_igk" — what's d_call now, and why?