Module 04 — Cloud Network Security¶
Type 4 · Audit→Build→Verify (+ Type 3 · Blast-Radius) — audit a VPC for what's actually reachable from the internet, then author a default-deny baseline as code and re-verify it holds. (Secondary: Blast-Radius — trace the transitive paths a foothold walks.) Go to the hands-on lab →
Last reviewed: 2026-06
Cloud & Container Security — a Security Group is the host firewall you already know, applied per-interface and composable — and your attack surface is the union of every rule, not any one of them.
In 60 seconds
A Security Group is the stateful host firewall you've written for years — but applied per-interface
and composable, so reachability is a graph, not a table you read top to bottom. The 2017–19 wave of
0.0.0.0/0-exposed MongoDB/Elasticsearch instances, and the network half of Capital One, were
failures of containment, not exploits. A per-rule audit calls a private DB "clean" while it sits one
ssh hop from the internet, because reach is transitive. And VPC egress is open by default — locking
ingress does nothing about exfiltration. The fix is a default-deny baseline authored as code, with a
scanner rule that re-checks it, because anyone with ec2:AuthorizeSecurityGroupIngress can re-punch
the hole.
The case¶
Between roughly 2017 and 2019 the same finding kept landing in the news under different company
names: a database or admin panel sitting on the open internet with 0.0.0.0/0 as its source range.
Internet-wide scanners (Shodan, BinaryEdge) catalogued tens of thousands of exposed Elasticsearch and
MongoDB instances — many requiring no authentication — and a wave of "MongoDB apocalypse" ransom
attacks wiped or held them hostage at scale. The cause was almost never an exploit. It was one ingress
rule, opened "temporarily" for a migration or a demo, that allowed the world to reach a port that should
have been reachable only from an app subnet.
The same year, Capital One lost ~100M records — and the network layer is a quiet co-defendant in that chain. The headline is SSRF and an over-broad IAM role (you ruled on that in Module 01), but the WAF was an internet-facing host allowed to make outbound calls to the metadata service and onward to S3. A tighter egress posture and tighter segmentation around that host would have shortened, or broken, the chain. The network controls didn't cause the breach, but they were the failed containment — the walls that should have stopped a foothold from becoming an exfiltration.
So before you read on, this module turns on one question about the most boring-looking object in the cloud — a Security Group ruleset:
Given a set of Security Groups, what is actually reachable from the internet?
Your job¶
By the end of this module you'll audit a cloud network for reachability, then close it as code. Map
The target account's VPC with cloudmapper, find the Security Groups that expose sensitive ports to
0.0.0.0/0, and trace the transitive paths an attacker actually walks. Then do the half auditing
skips: author a corrected, default-deny Security Group baseline and re-verify that the bad paths are
gone while the app still works — and encode the verdict as a scanner rule that fails the exposure and
passes the fix. Find → author the baseline → prove it holds: the exact motion of a cloud network review,
and the same shape you'll repeat with NetworkPolicies in Module 12.
Call it before you read on¶
Don't scroll. Write your gut answers — under-counting reachability here is the teaching event, and you'll grade yourself in the lab.
Q1. An
app-sgallows:22from0.0.0.0/0. Adb-sgallows:5432only fromapp-sg. The database has no public IP and lives in a private subnet. Is the database reachable from the internet?Q2. Your team audited ingress and found it clean — every sensitive port is locked down inbound. Has the network been secured against an attacker who already has a foothold on one instance?
Q3. You count six Security Groups, each "mostly fine." Where does the real attack surface live — in the worst single rule, or somewhere the per-group review can't see?
The reachability model, revealed¶
Hold your answers against these.
Q1 — reachability is transitive, and the audit that counts rules misses it. The database has no
public address and its own group only trusts app-sg. A per-rule scan calls it clean. But app-sg
exposes :22 to the world — so an attacker reaches the app instance, lands a shell, and from there
is a member of app-sg, which the database explicitly trusts. The database is reachable from the
internet; just not in one hop.
flowchart LR
Net(["Internet<br/>0.0.0.0/0"])
App["app instance<br/>(app-sg)"]
DB[("database<br/>db-sg, private subnet")]
Net -- ":22 open to the world" --> App
App -- "member of app-sg, which db-sg trusts" --> DB
The mental model: a Security Group is the stateful host firewall you've
written for years, but applied per-network-interface and composable — so reachability is a graph, not a
table. You don't read down the rules; you ask "what can the internet touch, and what can that touch,"
following group-references like edges. People reliably under-count this, and the under-count is how a
"locked-down" database ends up one ssh away from 0.0.0.0/0.
The mental model
A Security Group is the stateful host firewall you already know — but per-network-interface and composable. So reachability is a graph: don't read down the rules, follow group-references as edges and ask "what can the internet touch, and what can that touch?"
The gotcha
"Ingress is locked down, so we're secure" misses two things. Reach is transitive — a private DB that
only trusts app-sg is internet-reachable the moment app-sg is. And VPC egress is open by
default: a clean ingress audit does nothing about a foothold calling the metadata service or
shipping data out on 443.
Q2 — ingress is half the wall; the cloud's default egress is open. On-prem, a default-deny perimeter means nothing leaves unless you allow it. In a VPC, all outbound traffic is permitted by default — an intentional developer-experience choice that means the exfiltration path Capital One's WAF used was open out of the box. Locking down ingress stops the initial reach; it does nothing about a foothold calling the metadata service, pivoting east-west to a peer, or shipping data to an external IP on 443. A real baseline scopes egress too — VPC Endpoints so S3/DynamoDB traffic never leaves the AWS network, PrivateLink for third parties, restrictive egress rules for the rest — and treats "what must this workload legitimately reach" as the question, in both directions.
Q3 — the attack surface is the union of every rule, and it lives in the composition. No single
group in the lab is catastrophic on its own — that's exactly why per-group review passes them. The
exposure is the union: app-sg's open :22 plus db-sg's trust of app-sg is the chain; the
public ALB plus a missing egress rule is the exfil path. Network security in the cloud is reasoning
about the whole reachable set, not auditing rules one at a time — which is why the fix isn't "delete
the worst rule" but author a default-deny baseline: a group denies all ingress unless a rule
explicitly allows it, so least privilege means only the rules the architecture provably needs, nothing
"just in case." And because Security Groups are IAM-controlled API objects (Module 02), anyone with
ec2:AuthorizeSecurityGroupIngress can re-punch the hole — so the baseline only stays true if a guardrail
re-checks it. That guardrail is this module's deliverable.
Go deeper: the attack surface is a union, and it lives in the composition
No single group in the lab is catastrophic alone — which is exactly why per-group review passes all
of them. The exposure is the union: app-sg's open :22 plus db-sg's trust of app-sg is the
chain; the public ALB plus a missing egress rule is the exfil path. The fix isn't "delete the worst
rule" but author a default-deny baseline where only the rules the architecture provably needs exist —
no "just in case."
AI caveat
A model is a strong first-pass — it'll flag 0.0.0.0/0 on :22 instantly. But it reads rules as a
list, so it routinely misses the transitive hop (internet → app-sg → the DB that trusts
app-sg) and can't know whether a route table or private subnet makes a path live. Confirm each path
against cloudmapper's graph; you own the baseline.
Learn (~4 hrs)¶
Richer than a foundations module: cloud networking re-defines words you already know, and the reachability model carries into Kubernetes (Module 12). Read the case first, then the mechanism.
VPC and the firewall that isn't (~1.5 hrs) - AWS — How Amazon VPC works (~40 min) — the authoritative tour of subnets, route tables, Internet/NAT gateways, Security Groups and NACLs. Read it for the vocabulary the lab assumes; note which objects are control-plane (API-managed) versus data-plane. - AWS — Security Groups vs Network ACLs (~20 min) — the stateful-SG vs. stateless-NACL distinction and the default-permit-egress fact. This is the "host firewall, per-ENI, composable" mental model in primary-source form. - AWS — control traffic with VPC Endpoints / PrivateLink (~20 min, skim) — why keeping AWS-service and third-party traffic off the public internet is the egress baseline, not a nicety.
The exposure wave, from the source (~1 hr)
- Shodan — Elastic data exposure grows to 3.2 PB (~20 min, orient) — Shodan's own 2018→2020 measurement of internet-exposed Elasticsearch/MongoDB/HDFS instances; the 2017–19 wave never fully ended.
- Krebs on Security — the MongoDB ransom wave (~20 min) — contemporaneous reporting on the 0.0.0.0/0-exposed-DB ransom attacks; corroborate the scale.
- US Senate report — Capital One (~20 min, skim the network/WAF section) — re-read the chain with the network-containment lens: which walls were the network's job?
Mapping reachability (~1.5 hrs)
- cloudmapper — README (~30 min) — Duo Labs' topology mapper. Read the collect → prepare → audit → webserver workflow; audit is what surfaces the 0.0.0.0/0 findings, the graph is what communicates them.
- AWS — VPC Flow Logs (record format) (~30 min) — read the "Flow log records" section; the 5-tuple + ACCEPT/REJECT is how reachability is observed after the fact (scans = REJECT storms, exfil = a fat 443 flow to an external IP). You'll parse these in the lab.
- Checkov — AWS Security Group policies (~20 min, skim) — find the built-in rules that fail 0.0.0.0/0 on sensitive ports (e.g. CKV_AWS_24/25 for 22/3389); this is the guardrail you'll own.
Key concepts¶
- A Security Group is the stateful host firewall you know — but per-ENI and composable, so reachability is a graph (follow group-references as edges), not a per-rule table
- Transitive reach: a private DB that only trusts
app-sgis internet-reachable ifapp-sgis internet-reachable — auditing rules one at a time misses this - The attack surface is the union of every rule; the exposure usually lives in the composition, not the single worst line
- VPC egress is default-permit — locking ingress doesn't stop exfiltration; a real baseline scopes egress with VPC Endpoints / PrivateLink / restrictive rules
- Default-deny baseline: only the rules the architecture provably needs; "just in case" rules are the exposure
- Security Groups are IAM-controlled objects (Module 02) — the baseline only holds if a scanner re-checks it, which is why the fix ends in code
AI acceleration¶
Paste a Security Group set and ask a model "what's reachable from the internet, and what can each
reachable host then reach?" It's a strong first-pass — it'll flag the 0.0.0.0/0 on :22 immediately.
But it reads the rules as a list, and reachability is a graph: it routinely misses the transitive hop
(internet → app-sg → the database that trusts app-sg) and it can't know whether a route table or
private subnet actually makes a path live. Treat its output as a hypothesis and confirm each path against
the topology — cloudmapper's graph and your reachability check are ground truth, the model is the
draft. The skill it can't do for you is authoring the minimum default-deny baseline that closes the
reachable paths without breaking the app, and proving the cut. You direct it; you own the baseline.
Check yourself
app-sgallows:22from0.0.0.0/0;db-sgallows:5432only fromapp-sg, and the DB has no public IP. Is the database reachable from the internet, and why does a per-rule audit miss it?- Your ingress audit is clean. Why does that leave you exposed to an attacker who already has a foothold, and what does a real baseline scope that you didn't?
- Across six "mostly fine" Security Groups, where does the real attack surface live — and why does that make "author a default-deny baseline" the fix rather than "delete the worst rule"?
Comments
Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).