Day 5: AWS Solutions Architect Professional Prep : Advanced VPC Networking
Today’s lesson revolved around how to set up a network in a way that is scalable, secure, and highly available (HA). These were the big questions I delved into:
1. How do Security Groups (SGs) and NACLs differ in function?
2. Which type of Endpoint should I choose for private AWS connectivity?
3. What is the scalable solution for connecting many VPCs?
4. How should NAT Gateways be deployed for High Availability (HA)?
First, a brief overview of key concepts. Your VPC is your entire network setup in the cloud. We might compare a Virtual Private Cloud (VPC) to a large building or campus. If your VPC is your own private property , you would no doubt organise them into rooms; subnets are the specialized rooms within that building. Continuing that analogy, the public Subnet would be the front door since it has a direct route to the street, which is the Internet Gateway (IGW). Any resource here needs a public address to be accessible from the internet. We might call the Private Subnet the back office- this room has no direct route to the street, or IGW. If resources here need to communicate with the internet (e.g., for updates), they must use a dedicated, shared payphone, or NAT Gateway located in a public subnet. An isolated subnet is the vault – a room that is completely locked down and has no route to the street (IGW) or payphone (the NAT Gateway). It is ideal for highly sensitive data like internal databases and backups .Every room has a Route Table – a map that tells the traffic leaving that room where to go.
1. How do Security Groups (SGs) and NACLs differ in function?
We have two main tools or security layers to protect our resources- SGs and NACLs.
A Security Group (SG) is a bodyguard for a single resource like an EC2 instance or ENI. SGs are stateful: If the bodyguard allows traffic in based on an inbound rule, it automatically remembers and allows the return traffic. They only use allow rules.
A NACL (Network Access Control List) is a fence around the entire subnet. NACLs are stateless: you must explicitly write rules for traffic in and traffic out. The NACL (Fence) is checked before the Security Group (Bodyguard). NACLs are best used for coarse perimeter filtering, especially to implement deny rules (e.g., blocking known malicious IP ranges).
Remember: whereas SGs are applied per resource (instance, ENI), NACLs are applied per subnet.
2. Which type of Endpoint should I choose for private AWS connectivity?
If your private resources need to use AWS services without traversing the internet or NAT Gateway, you use Endpoints. The type depends on the service:
When accessing S3 or DynamoDB, use a Gateway Endpoint . This is a free, dedicated route defined in your route table via a prefix list.
When accessing private service APIs (like KMS, SSM, or third-party SaaS services), use an Interface Endpoint (PrivateLink) . This works by creating a dedicated network card (ENI) inside your subnet.
3. What is the scalable solution for connecting many VPCs?
VPC Peering is like a secret tunnel joining two houses. It is a simple, low-latency, point-to-point connection which is great for connecting two neighboring VPCs. The downside is that it does not scale well. If you have 10 VPCs, you need dozens of connections. Peering is non-transitive—traffic cannot pass through one peer to reach a third VPC.
Transit Gateway (TGW), on the other hand, is the scalable, centralized hub-and-spoke solution. All VPCs attach to the TGW, thus allowing traffic to flow between any of them. Use TGW for enterprise scale involving many VPCs and connections back to your on-premises network.
4. How should NAT Gateways be deployed for High Availability (HA)?
Remember the payphone analogy? NAT Gateway, the payphone is AZ-specific. So If you only install one payphone in one specific geographic area (Availability Zone), two things happen:
1. If that area fails, all your private rooms in all other areas lose their outbound calling ability.
2. All traffic crossing to use that single payphone costs you extra money.
The solution? For robust design, you need a payphone in every area where you have private rooms. In other words, in each Availability Zone where you have private subnets you must deploy a separate NAT Gateway to ensure high availability and avoid cross-AZ data transfer costs.
These were my big takeaways from today’s lesson:
- SGs are stateful and applied at the instance level while NACLs are stateless and checked first at the subnet level, allowing you to use deny rules for blocking traffic.
- Use the free Gateway Endpoint for S3 and DynamoDB, and use the Interface Endpoint for all other services and APIs.
- Use Transit Gateway for enterprise scenarios involving many VPCs (hub-and-spoke). VPC peering is non-transitive.
- For High Availabilty (HA) in production, deploy one NAT gateway per AZ. Do not assume a NAT Gateway is multi-AZ
See you tomorrow!
