Day 7: AWS Solutions Architect Professional Prep — Auto Scaling & High Availability Patterns
How do major online services stay fast even when millions of people log in at once? How do they survive a major regional power outage without skipping a beat? Today’s lesson examines two foundational architectural principles that make it possible, namely, Auto Scaling and High Availability
The key questions I examined today were:
What are the fundamental components of Auto Scaling that enable automatic capacity management?
Which scaling policy is best suited for handling predictable, recurring spikes in user traffic?
How does the system ensure fault tolerance and automatically replace instances that fail?
What is the required architectural pattern for designing highly available applications?
Question 1: What are the fundamental components of Auto Scaling that enable automatic capacity management?
Auto Scaling is the process of automatically adjusting server capacity by adding or removing EC2 instances based on demand. It is like having a smart manager for your fleet of servers who automatically adds servers when you get busy and removes servers to save money when traffic slows down.
It has three core components:
1. The Launch Template. It defines how new instances are configured (e.g., AMI, type, security groups). The recommendation is to use Launch Templates for modern designs because they support versioning and multiple instance types.
2. The Auto Scaling Group (ASG). This is the logical unit managing the group of EC2 instances. Key ASG parameters include a min size(the smallest number of servers always running), a max size (the absolute capacity limit) and desired capacity (the target number of instances)
3. Scaling Policies. They are the rules which dictate exactly when and how capacity should change.
Question 2: Which scaling policy is best suited for handling predictable, recurring spikes in user traffic?
There are four main scaling policy types; three are critical to remember
- Predictive Scaling: This uses machine learning to forecast demand so it is ideal for predictable, recurring traffic spikes, such as 9-to-5 office hours or nightly batch jobs. The system scales out capacity before the traffic actually arrives.
2. Target Tracking Scaling is the simplest and most common policy. It is like setting a thermostat. You set the goal eg. keep the server’s CPU utilization exactly at 50%. If the temperature (CPU) goes up, the system automatically adds capacity until the goal is met.
3. Step Scaling: This scales incrementally based on specific threshold breaches.E.g., add 1 instance if CPU hits 60%; add 2 instances if it hits 80%.
Question 3: How does the system ensure fault tolerance and automatically replace instances that fail?
Fault tolerance relies heavily on Health Checks. Health Checks determine if an instance is operational. The ASG is constantly checking the health of servers. If a server fails its check – freezes up or stops responding – the ASG does not try to fix it. It immediately fires that instance and spins up a brand new, healthy replacement automatically.
While the default is the EC2 status check, it is advanced practice to combine this with a check through the load balancer (ELB check) to ensure it’s serving traffic correctly (end-to-end resilience)
Lifecycle Hooks are like a pause button that allow you to pause an instance briefly during transition to ensure everything is handled gracefully. When a new server starts up or an old one shuts down, it might need a minute for running configuration scripts, attaching necessary monitoring agents, or gracefully deregistering the instance before shutdown. These hooks pause the instance during this phase.
Question 4: What is the required architectural pattern for designing highly available applications?
To ensure service never goes down, redundancy is key. Never put all your eggs in one basket, as the expression goes.
Multi-AZ Architecture: This means you don’t put all your servers in one data center building (Availability Zone). If there is a power outage in that building, the whole application goes down.
Using a Multi-AZ pattern involves spreading servers across two separate geographical areas so that if one fails, the other keeps running, which is required for high availability.This pattern usually uses an Application Load Balancer (ALB) combined with an ASG spanning the AZs to ensure fault-tolerant load balancing and redundancy.
Multi-Region Active-Passive: In this design, a primary region handles traffic, while a secondary region waits while maintaining minimal standby resources, using Route 53 Failover Routing for switchover.
Multi-Region Active-Active: This global strategy requires that both regions serve traffic simultaneously, requiring global data replication and often using Route 53 Latency Routing to send users to the closest region
To sum up, here are a few highlights from the lesson:
The Auto Scaling Group (ASG) manages both automatic EC2 scaling and automated replacement of unhealthy instances
Use Launch Templates over Launch Configurations for modern designs
Choose Predictive Scaling for managing forecasted or cyclical demand.
For high availability, use a Multi-AZ architecture
See you tomorrow!
