Skip to content

Module 04 — Cloud Network Security

Type 4 · Audit→Build→Verify (+ Type 3 · Blast-Radius) — audit a VPC for what's actually reachable from the internet, then author a default-deny baseline as code and re-verify it holds. (Secondary: Blast-Radius — trace the transitive paths a foothold walks.) Go to the hands-on lab →

Last reviewed: 2026-06

Cloud & Container Securitya Security Group is the host firewall you already know, applied per-interface and composable — and your attack surface is the union of every rule, not any one of them.

Difficulty: Intermediate  ·  Estimated time: ~5–7 hrs (study + lab)  ·  Prerequisites: Foundations · Module 01 — Shared Responsibility · Module 02 — Identity & IAM

In 60 seconds

A Security Group is the stateful host firewall you've written for years — but applied per-interface and composable, so reachability is a graph, not a table you read top to bottom. The 2017–19 wave of 0.0.0.0/0-exposed MongoDB/Elasticsearch instances, and the network half of Capital One, were failures of containment, not exploits. A per-rule audit calls a private DB "clean" while it sits one ssh hop from the internet, because reach is transitive. And VPC egress is open by default — locking ingress does nothing about exfiltration. The fix is a default-deny baseline authored as code, with a scanner rule that re-checks it, because anyone with ec2:AuthorizeSecurityGroupIngress can re-punch the hole.

The case

Between roughly 2017 and 2019 the same finding kept landing in the news under different company names: a database or admin panel sitting on the open internet with 0.0.0.0/0 as its source range. Internet-wide scanners (Shodan, BinaryEdge) catalogued tens of thousands of exposed Elasticsearch and MongoDB instances — many requiring no authentication — and a wave of "MongoDB apocalypse" ransom attacks wiped or held them hostage at scale. The cause was almost never an exploit. It was one ingress rule, opened "temporarily" for a migration or a demo, that allowed the world to reach a port that should have been reachable only from an app subnet.

The same year, Capital One lost ~100M records — and the network layer is a quiet co-defendant in that chain. The headline is SSRF and an over-broad IAM role (you ruled on that in Module 01), but the WAF was an internet-facing host allowed to make outbound calls to the metadata service and onward to S3. A tighter egress posture and tighter segmentation around that host would have shortened, or broken, the chain. The network controls didn't cause the breach, but they were the failed containment — the walls that should have stopped a foothold from becoming an exfiltration.

So before you read on, this module turns on one question about the most boring-looking object in the cloud — a Security Group ruleset:

Given a set of Security Groups, what is actually reachable from the internet?

Your job

By the end of this module you'll audit a cloud network for reachability, then close it as code. Map The target account's VPC with cloudmapper, find the Security Groups that expose sensitive ports to 0.0.0.0/0, and trace the transitive paths an attacker actually walks. Then do the half auditing skips: author a corrected, default-deny Security Group baseline and re-verify that the bad paths are gone while the app still works — and encode the verdict as a scanner rule that fails the exposure and passes the fix. Find → author the baseline → prove it holds: the exact motion of a cloud network review, and the same shape you'll repeat with NetworkPolicies in Module 12.

Call it before you read on

Don't scroll. Write your gut answers — under-counting reachability here is the teaching event, and you'll grade yourself in the lab.

Q1. An app-sg allows :22 from 0.0.0.0/0. A db-sg allows :5432 only from app-sg. The database has no public IP and lives in a private subnet. Is the database reachable from the internet?

Q2. Your team audited ingress and found it clean — every sensitive port is locked down inbound. Has the network been secured against an attacker who already has a foothold on one instance?

Q3. You count six Security Groups, each "mostly fine." Where does the real attack surface live — in the worst single rule, or somewhere the per-group review can't see?

The reachability model, revealed

Hold your answers against these.

Q1 — reachability is transitive, and the audit that counts rules misses it. The database has no public address and its own group only trusts app-sg. A per-rule scan calls it clean. But app-sg exposes :22 to the world — so an attacker reaches the app instance, lands a shell, and from there is a member of app-sg, which the database explicitly trusts. The database is reachable from the internet; just not in one hop.

flowchart LR
    Net(["Internet<br/>0.0.0.0/0"])
    App["app instance<br/>(app-sg)"]
    DB[("database<br/>db-sg, private subnet")]
    Net -- ":22 open to the world" --> App
    App -- "member of app-sg, which db-sg trusts" --> DB

The mental model: a Security Group is the stateful host firewall you've written for years, but applied per-network-interface and composable — so reachability is a graph, not a table. You don't read down the rules; you ask "what can the internet touch, and what can that touch," following group-references like edges. People reliably under-count this, and the under-count is how a "locked-down" database ends up one ssh away from 0.0.0.0/0.

The mental model

A Security Group is the stateful host firewall you already know — but per-network-interface and composable. So reachability is a graph: don't read down the rules, follow group-references as edges and ask "what can the internet touch, and what can that touch?"

The gotcha

"Ingress is locked down, so we're secure" misses two things. Reach is transitive — a private DB that only trusts app-sg is internet-reachable the moment app-sg is. And VPC egress is open by default: a clean ingress audit does nothing about a foothold calling the metadata service or shipping data out on 443.

Q2 — ingress is half the wall; the cloud's default egress is open. On-prem, a default-deny perimeter means nothing leaves unless you allow it. In a VPC, all outbound traffic is permitted by default — an intentional developer-experience choice that means the exfiltration path Capital One's WAF used was open out of the box. Locking down ingress stops the initial reach; it does nothing about a foothold calling the metadata service, pivoting east-west to a peer, or shipping data to an external IP on 443. A real baseline scopes egress too — VPC Endpoints so S3/DynamoDB traffic never leaves the AWS network, PrivateLink for third parties, restrictive egress rules for the rest — and treats "what must this workload legitimately reach" as the question, in both directions.

Q3 — the attack surface is the union of every rule, and it lives in the composition. No single group in the lab is catastrophic on its own — that's exactly why per-group review passes them. The exposure is the union: app-sg's open :22 plus db-sg's trust of app-sg is the chain; the public ALB plus a missing egress rule is the exfil path. Network security in the cloud is reasoning about the whole reachable set, not auditing rules one at a time — which is why the fix isn't "delete the worst rule" but author a default-deny baseline: a group denies all ingress unless a rule explicitly allows it, so least privilege means only the rules the architecture provably needs, nothing "just in case." And because Security Groups are IAM-controlled API objects (Module 02), anyone with ec2:AuthorizeSecurityGroupIngress can re-punch the hole — so the baseline only stays true if a guardrail re-checks it. That guardrail is this module's deliverable.

Go deeper: the attack surface is a union, and it lives in the composition

No single group in the lab is catastrophic alone — which is exactly why per-group review passes all of them. The exposure is the union: app-sg's open :22 plus db-sg's trust of app-sg is the chain; the public ALB plus a missing egress rule is the exfil path. The fix isn't "delete the worst rule" but author a default-deny baseline where only the rules the architecture provably needs exist — no "just in case."

AI caveat

A model is a strong first-pass — it'll flag 0.0.0.0/0 on :22 instantly. But it reads rules as a list, so it routinely misses the transitive hop (internet → app-sg → the DB that trusts app-sg) and can't know whether a route table or private subnet makes a path live. Confirm each path against cloudmapper's graph; you own the baseline.

Learn (~4 hrs)

Richer than a foundations module: cloud networking re-defines words you already know, and the reachability model carries into Kubernetes (Module 12). Read the case first, then the mechanism.

VPC and the firewall that isn't (~1.5 hrs) - AWS — How Amazon VPC works (~40 min) — the authoritative tour of subnets, route tables, Internet/NAT gateways, Security Groups and NACLs. Read it for the vocabulary the lab assumes; note which objects are control-plane (API-managed) versus data-plane. - AWS — Security Groups vs Network ACLs (~20 min) — the stateful-SG vs. stateless-NACL distinction and the default-permit-egress fact. This is the "host firewall, per-ENI, composable" mental model in primary-source form. - AWS — control traffic with VPC Endpoints / PrivateLink (~20 min, skim) — why keeping AWS-service and third-party traffic off the public internet is the egress baseline, not a nicety.

The exposure wave, from the source (~1 hr) - Shodan — Elastic data exposure grows to 3.2 PB (~20 min, orient) — Shodan's own 2018→2020 measurement of internet-exposed Elasticsearch/MongoDB/HDFS instances; the 2017–19 wave never fully ended. - Krebs on Security — the MongoDB ransom wave (~20 min) — contemporaneous reporting on the 0.0.0.0/0-exposed-DB ransom attacks; corroborate the scale. - US Senate report — Capital One (~20 min, skim the network/WAF section) — re-read the chain with the network-containment lens: which walls were the network's job?

Mapping reachability (~1.5 hrs) - cloudmapper — README (~30 min) — Duo Labs' topology mapper. Read the collect → prepare → audit → webserver workflow; audit is what surfaces the 0.0.0.0/0 findings, the graph is what communicates them. - AWS — VPC Flow Logs (record format) (~30 min) — read the "Flow log records" section; the 5-tuple + ACCEPT/REJECT is how reachability is observed after the fact (scans = REJECT storms, exfil = a fat 443 flow to an external IP). You'll parse these in the lab. - Checkov — AWS Security Group policies (~20 min, skim) — find the built-in rules that fail 0.0.0.0/0 on sensitive ports (e.g. CKV_AWS_24/25 for 22/3389); this is the guardrail you'll own.

Key concepts

  • A Security Group is the stateful host firewall you know — but per-ENI and composable, so reachability is a graph (follow group-references as edges), not a per-rule table
  • Transitive reach: a private DB that only trusts app-sg is internet-reachable if app-sg is internet-reachable — auditing rules one at a time misses this
  • The attack surface is the union of every rule; the exposure usually lives in the composition, not the single worst line
  • VPC egress is default-permit — locking ingress doesn't stop exfiltration; a real baseline scopes egress with VPC Endpoints / PrivateLink / restrictive rules
  • Default-deny baseline: only the rules the architecture provably needs; "just in case" rules are the exposure
  • Security Groups are IAM-controlled objects (Module 02) — the baseline only holds if a scanner re-checks it, which is why the fix ends in code

AI acceleration

Paste a Security Group set and ask a model "what's reachable from the internet, and what can each reachable host then reach?" It's a strong first-pass — it'll flag the 0.0.0.0/0 on :22 immediately. But it reads the rules as a list, and reachability is a graph: it routinely misses the transitive hop (internet → app-sg → the database that trusts app-sg) and it can't know whether a route table or private subnet actually makes a path live. Treat its output as a hypothesis and confirm each path against the topology — cloudmapper's graph and your reachability check are ground truth, the model is the draft. The skill it can't do for you is authoring the minimum default-deny baseline that closes the reachable paths without breaking the app, and proving the cut. You direct it; you own the baseline.

Check yourself

  • app-sg allows :22 from 0.0.0.0/0; db-sg allows :5432 only from app-sg, and the DB has no public IP. Is the database reachable from the internet, and why does a per-rule audit miss it?
  • Your ingress audit is clean. Why does that leave you exposed to an attacker who already has a foothold, and what does a real baseline scope that you didn't?
  • Across six "mostly fine" Security Groups, where does the real attack surface live — and why does that make "author a default-deny baseline" the fix rather than "delete the worst rule"?

Comments

Sign in with GitHub to comment. Choose the type: Feedback (errors or suggestions on this page) · Hints (help for fellow learners — no spoilers) · General (anything else).