Module 11 — Post-Quantum & Crypto-Agility Migration¶

Type 12 · Migration / Brownfield — migrate a TLS service from classic ECDHE/RSA to a hybrid X25519+ML-KEM key exchange incrementally, proving interop at each step (strangler-fig); the deliverable is the crypto inventory + the migrated config + before/after handshake captures proving nothing broke, not an essay. Go to the hands-on lab →

Last reviewed: 2026-06

[Track 08 — Cryptography, PKI & Secrets] — the algorithms you ship today were chosen for a world without quantum computers; the job no longer being deferred is migrating them while the service stays up and nothing on the wire stops interoperating.

Difficulty: Advanced · Estimated time: ~6–8 hrs (study + lab) · Prerequisites: Foundations, Module 05 — TLS Deep Dive (the handshake you migrate)

In 60 seconds

The quantum threat isn't a future computer — it's harvest-now-decrypt-later: an adversary stores your encrypted traffic today and reads it once a quantum machine exists. That targets the key exchange (Shor breaks ECDHE), not symmetric crypto, so ML-KEM (FIPS 203) is the migration you do first; signatures (204/205) follow on a slower clock. The fix is hybrid — run X25519 and ML-KEM and combine both secrets, safe if either holds — added via the strangler-fig pattern so old clients keep connecting. Done means a handshake capture proves the new path negotiates and the old one still works, not that the config has the keyword.

Why this matters¶

The threat that makes this urgent is not a quantum computer you have to fear today — it is an adversary who copies your encrypted traffic today and decrypts it later, once a cryptographically relevant quantum computer exists. This is harvest-now-decrypt-later (HNDL), and it inverts the usual "we'll upgrade when the attack is real" calculus: for anything that must stay confidential for years — health records, source code, diplomatic cables, long-lived secrets — the attack window is now, because the ciphertext an attacker stores this morning is the plaintext they read in 2035. The part of TLS that HNDL targets is the key exchange (the ECDHE handshake that establishes the session key), because that is what a future quantum computer running Shor's algorithm breaks — recover the key-exchange secret and you decrypt the whole captured session. Symmetric encryption (AES) and hashing are comparatively safe; the key agreement is the exposed surface, and it is exposed retroactively.

The standards caught up in 2024. On August 13, 2024, NIST finalized three post-quantum standards: FIPS 203 (ML-KEM) — the Module-Lattice-Based Key-Encapsulation Mechanism, the one that replaces the key exchange and the one HNDL makes urgent; FIPS 204 (ML-DSA) and FIPS 205 (SLH-DSA) — the two post-quantum signature standards. The split matters: signatures authenticate a live handshake, so a forged signature is only useful during the connection — there is no "harvest a signature now, forge it later." Key exchange is the opposite. That is why ML-KEM (key exchange) is the migration you do first, and the signature migration (ML-DSA/SLH-DSA) follows on its own, slower clock.

This is not a future exercise. Hybrid key exchange is already live on the internet: Chrome and Cloudflare have been negotiating X25519 combined with ML-KEM-768 on real TLS 1.3 connections, with Cloudflare reporting double-digit percentages of post-quantum traffic. "Hybrid" is the whole trick — you run the classical X25519 exchange and the ML-KEM exchange together and combine both shared secrets, so the connection is safe if either one holds: you keep X25519's battle-tested security against classical attackers while adding ML-KEM's resistance to a future quantum one. The job, then, is the textbook brownfield migration: a service is on classic ECDHE/RSA today, it cannot go down, and old clients must keep connecting. You inventory what it uses, move it to a hybrid suite, and prove at every step that nothing broke — that is this module.

The core idea¶

The skill this module builds is crypto-agility: the property that lets you change a system's algorithms without re-architecting it. Most brownfield crypto is the opposite — an algorithm name is hardcoded in a config, baked into a protocol assumption, or buried in a library default nobody has revisited since deploy. A system is crypto-agile when the algorithm is a negotiated, swappable parameter, not a constant. PQC migration is the forcing function that exposes whether you have agility or not: the teams that can flip on a hybrid group with a config line have it; the teams who discover the algorithm is welded into a binary do not. The deliverable that proves agility is the same one that proves the migration — a crypto inventory (what algorithm is used where, found by reading the actual handshake, not the documentation) followed by a controlled swap.

The mental model

Migrating crypto is a brownfield refactor, not a math upgrade. The hard part isn't ML-KEM the algorithm — it's that the algorithm is welded into configs, library defaults, and protocol assumptions across a running estate that can't go down. Treat the algorithm as a parameter you negotiate, prove the new path by reading the wire, and never cut the old one down in one stroke.

The gotcha

A config with X25519MLKEM768 in it can silently fall back to classical X25519 — your library build doesn't support the group, or the client never offers it — and you're exactly as exposed as before, now with false confidence. Done is proven by the handshake capture, never by the config file. The config lies; the pcap doesn't.

The migration is hybrid by design, and that design choice is the safety argument. You do not rip out X25519 and bolt on ML-KEM; you run both and combine their secrets:

flowchart LR
    X["X25519 exchange"] --> SX["classical secret"]
    K["ML-KEM-768 encapsulation"] --> SK["post-quantum secret"]
    SX --> COMB["combine (concatenate)"]
    SK --> COMB
    COMB --> SK2["session key — safe if either holds"]

The reasoning is hedged failure: ML-KEM is new, lattice cryptography is younger than elliptic curves, and a classical break of a fresh PQC scheme is not unthinkable (history is littered with PQC candidates that fell in analysis). X25519 is decades-hardened against classical attackers but quantum-broken in principle. Combine them and the session key derives from both shared secrets, so an attacker must break both X25519 (needs a quantum computer) and ML-KEM (needs a classical lattice break) to recover it. Hybrid is the bet that one of the two will always hold — it is strictly safer than either alone, which is exactly why the early internet deployments (X25519MLKEM768) are hybrid rather than pure-PQC.

Go deeper: why key exchange migrates first and signatures wait

The HNDL clock runs differently for confidentiality and authenticity. A key exchange secret stolen today is decrypted whenever a quantum computer arrives — the harm is retroactive, so the window is now, which is why ML-KEM (FIPS 203) is the urgent move. A signature, by contrast, only matters during the live handshake it authenticates; you can't "harvest a signature now, forge it later" because the connection it protected is long gone. So ML-DSA/SLH-DSA (FIPS 204/205) migrate on their own, slower track — and conflating the two is a classic way PQC migrations stall.

The migration method is the strangler fig (Martin Fowler's pattern: grow the new capability around the running system, move across incrementally, and never cut the old one down in a single stroke). Applied to a cipher: you do not flip the server to demand ML-KEM and call it done — a server that only offers the hybrid group will refuse every old client that does not understand it, and you have traded a quantum risk for an immediate outage. Instead you add the hybrid group to the server's supported list alongside the classical ones and let TLS negotiation do its job: a modern client offers X25519MLKEM768 and gets the post-quantum-safe path; an old client offers only X25519 and still connects classically. The estate's quantum-exposed surface shrinks toward zero one client-capability at a time, while interop is never broken. The four beats are: inventory (read the real handshake — what group/cipher is actually negotiated), pick the hybrid suite (X25519MLKEM768, the group Chrome/Cloudflare deployed), migrate without breaking interop (add the hybrid group, keep the classical fallback), and prove nothing broke (capture the handshake before and after, and show old clients still connect while new clients now negotiate hybrid).

The honest gotcha that separates a real migration from a checkbox one: a PQC migration is "done" only when you have proven the new path is negotiated and the old path still works — not when the config has the new keyword in it. It is easy to add X25519MLKEM768 to a config, reload, and assume you are post-quantum-safe — but if your build of the TLS library doesn't actually support the group, or the client never offers it, the handshake silently falls back to classical X25519 and you are exactly as exposed as before, now with false confidence. The proof is in the handshake capture, not the config file: a pcap or openssl s_client trace showing the negotiated group is the hybrid one for a modern client, and showing a legacy client still completing its classical handshake. The before/after captures are the deliverable precisely because the config alone lies. (Signatures — FIPS 204/205 — migrate on their own track and are explicitly out of scope here: ML-KEM key exchange is the HNDL-urgent move, and conflating the two is how migrations stall.)

Learn (~3 hrs)¶

Read enough to understand the HNDL threat, why hybrid, and what a real X25519MLKEM768 handshake looks like — then go inventory and migrate a service in the lab.

The threat and the standards — why now (~45 min) - NIST — Post-Quantum Cryptography project (overview) (~15 min) — the authoritative home page for FIPS 203 (ML-KEM), 204 (ML-DSA), and 205 (SLH-DSA), finalized Aug 13 2024. Read it for the map: which standard replaces key exchange vs. signatures, and NIST's "deploy now" guidance. This is the canonical citation for every claim about which algorithm you are migrating to. - FIPS 203 — Module-Lattice-Based Key-Encapsulation Mechanism Standard (ML-KEM) (~20 min) — the actual standard for the algorithm at the center of this module. Read the introduction and §1–2 (you do not need the math): it defines ML-KEM as a KEM (encapsulation, not classic Diffie–Hellman), and the three parameter sets ML-KEM-512/768/1024. Knowing 768 is the one in X25519MLKEM768 is what makes the lab's config legible rather than copied.

What hybrid deployment actually looks like in production (~45 min) - Cloudflare — The state of the post-quantum Internet (Bas Westerbaan, Mar 5 2024) (~30 min, skim) — the best real-world deployment writeup: why Cloudflare and Chrome ship a hybrid X25519 + ML-KEM-768 key agreement, the harvest-now-decrypt-later motivation in plain terms, and live adoption numbers. Read it as proof this is shipping today, not a lab toy — and for the operational gotchas (middlebox/ "protocol ossification" breakage on larger handshakes) you will see echoes of in your own captures.

The hybrid construction and the suite you'll deploy (~1.5 hrs) - IETF — Hybrid key exchange in TLS 1.3 (draft-ietf-tls-hybrid-design) (~30 min) — the framework draft for how TLS 1.3 combines a classical and a post-quantum exchange into one negotiated group, and concatenates both shared secrets into the key schedule. Read §1–3 for the security rationale (safe if either component holds) — this is the standards-body version of the "why hybrid" argument and the thing to cite when someone asks "why not just ML-KEM." - IETF — Post-quantum hybrid ECDHE-MLKEM Key Agreement for TLS 1.3 (draft-ietf-tls-ecdhe-mlkem) (~20 min) — the draft that actually names X25519MLKEM768 (plus the SecP256r1/384r1 variants) as concrete TLS named groups. Read it to know exactly which group string you put in the config and what it combines — this is the spec your lab config conforms to. - OpenSSL 3.5 — final release announcement (Apr 8 2025) and the EVP_KEM-ML-KEM(7) man page (~30 min) — OpenSSL 3.5 ships ML-KEM/ML-DSA/SLH-DSA natively, so the lab needs no third-party provider if you have 3.5+. Read the announcement's PQC bullet and the man page's parameter-set list; you'll use the -groups X25519MLKEM768 option to s_client/s_server to drive the migration. (If you're stuck on OpenSSL 3.0–3.4, the oqs-provider is the fallback — note which path your lab uses.)

Key concepts¶

Harvest-now-decrypt-later (HNDL) — encrypted traffic captured today is decrypted later by a quantum computer; the attack window for long-lived secrets is now, which is why key exchange (not signatures) is the urgent migration.
The split: FIPS 203 vs. 204/205 — ML-KEM (203) replaces the key exchange and is HNDL-urgent; ML-DSA (204) and SLH-DSA (205) replace signatures and migrate on a separate, slower clock. Don't conflate them.
Hybrid = safe if either holds — run X25519 and ML-KEM and combine both secrets; an attacker must break both (a quantum computer for X25519 and a classical lattice break for ML-KEM). Strictly safer than either alone — the reason early deployments are hybrid, not pure-PQC.
Crypto-agility is the underlying skill — the algorithm should be a swappable, negotiated parameter, not a hardcoded constant. PQC migration is the forcing function that reveals whether a system has agility or has the algorithm welded in.
Strangler-fig, not big-bang — add the hybrid group alongside the classical ones; TLS negotiation gives modern clients the PQC path and old clients the classical fallback. A server that only offers the hybrid group trades a quantum risk for an immediate outage.
The proof is the handshake capture, not the config — a config with X25519MLKEM768 in it can silently fall back to classical X25519 (unsupported build, client never offers it). Done = a before/after capture proving the new path negotiates and old clients still connect.

AI acceleration¶

A model is genuinely strong at the inventory and translation half of this work: point it at a testssl.sh run or an openssl s_client transcript and ask it to extract the negotiated group/cipher per endpoint into a crypto inventory table, classify each as quantum-exposed (key exchange) or not, and draft the OpenSSL/nginx config diff that adds the hybrid group while keeping the classical fallback. That is real leverage on the tedious bookkeeping. But the posture is strict, because the model's failure mode here is the same dangerous one a rushed engineer has: ask it to "make this server post-quantum" and it will happily hand you a config that sets the supported group to only X25519MLKEM768 — a clean big-bang that breaks every legacy client — because demanding the new algorithm is the simplest thing to express and the model carries none of the operational fear of an interop outage. The judgment it cannot do for you is verifying the proof: asked to "confirm the migration worked," a model will read your config back to you and pronounce it done, missing that the handshake silently fell back to classical because your library didn't support the group. So: AI drafts the inventory and the config diff → you keep the classical fallback in → you read the before/after handshake captures yourself and confirm the negotiated group is actually the hybrid one for a modern client and that a legacy client still completes its classical handshake. AI authors the migration plan; you own the capture that proves nothing broke.

Check yourself

Why does harvest-now-decrypt-later make key exchange the urgent migration while signatures can wait — and which FIPS standard does each map to?
In a hybrid X25519+ML-KEM exchange, what must an attacker break to recover the session key, and why is that strictly safer than pure-PQC?
Your config has X25519MLKEM768 and reloaded cleanly. Why is that not proof the migration worked, and what artifact actually proves it?

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).