Cloud Security Fundamentals: Protecting Your Infrastructure

In July 2019, Capital One disclosed that a former AWS employee had exploited a misconfigured AWS Web Application Firewall to access 100 million credit card applications and accounts. The misconfiguration was a WAF that allowed Server-Side Request Forgery (SSRF) — the attacker made the WAF query the EC2 instance metadata service at http://169.254.169.254/, retrieved IAM credentials from the metadata endpoint, and used those credentials to list and download objects from 700 S3 buckets. Capital One paid $80 million in regulatory fines. The attacker served five years in prison.

The breach required no zero-day exploits. No insider access to AWS's infrastructure. No sophisticated persistence mechanisms. One misconfigured WAF, one unprotected metadata endpoint, and overpermissioned IAM roles were sufficient to expose 100 million people's financial data.

This is the recurring pattern in cloud security. AWS, Azure, and GCP provide hardened physical infrastructure, resilient control planes, and every security service you could need — but the configuration of those services, the permissions on your IAM roles, and the network exposure of your resources are entirely your responsibility. Most cloud breaches are not AWS getting hacked. They are your misconfiguration getting exploited.

The Shared Responsibility Model: Where the Line Actually Is

Every major cloud provider publishes a shared responsibility model. Understanding exactly where your responsibility begins is not a compliance checkbox — it is the foundational question that determines your entire security posture.

What the cloud provider owns:

Physical data center security: guards, cameras, biometrics, cage access
Hardware integrity: servers, switches, storage arrays
Hypervisor and virtualization layer
Global network backbone and DDoS absorption at the network layer
Managed service underlying infrastructure patches (RDS database engine updates, Lambda runtime patches, EKS control plane updates)
Hardware Security Module (HSM) tamper resistance for KMS

What you own, no matter which cloud:

Identity and access management: every IAM user, role, policy, and group
Data classification and protection: encryption at rest and in transit
Network architecture: VPCs, subnets, security groups, NACLs, routing
OS and application patching for anything you manage (EC2, self-managed databases)
All application-layer security
Monitoring, alerting, and incident response
Configuration of every managed service you deploy
Compliance with regulations applicable to your data

The confusion point is managed services. When you use RDS Multi-AZ, AWS patches the database engine. When you use Lambda, AWS patches the Node.js or Python runtime. But you still own every IAM policy, every database user, every environment variable containing credentials, and every network rule controlling who reaches that service. The managed layer removes operational burden — it does not remove security responsibility.

A useful mental model: the cloud provider gives you a locked building with security guards and CCTV. Inside, you decide who gets a key, which rooms they can enter, whether you leave the filing cabinets unlocked, and whether you are watching the security cameras.

IAM: The Control Plane That Rules Everything Else

Identity and Access Management is where cloud security succeeds or fails. An attacker with an overpermissioned IAM credential can do more damage in a cloud environment than a traditional network intrusion — because the credential is the identity. There is no network perimeter to bypass, no VLAN to pivot across. An IAM principal with S3:* on *, or worse, AdministratorAccess, can exfiltrate every piece of data in your account.

The Principle of Least Privilege in Practice

Least privilege means every IAM principal — human user, service account, EC2 instance profile, Lambda execution role, ECS task role — should have only the specific actions on specific resources required for its function. Not "probably won't hurt" permissions. Not "might need this later" permissions. Only what is demonstrably required for the current function.

Start with deny-all, add explicit allows. AWS IAM denies everything by default. An IAM principal with no policies attached can do nothing. Build up from zero. Resist the temptation to use AWS managed policies like AdministratorAccess or PowerUserAccess for service roles — write custom policies.

A principle violated in the Capital One case: the WAF's instance role had s3:GetObject on * — permission to read any S3 object in the account. The WAF needed no such access. The permission existed because it was easier to grant broad access than to enumerate the specific buckets the WAF needed.

The Capital One breach pattern reconstructed:

1. Attacker sends SSRF payload to the WAF endpoint
2. WAF makes HTTP request to http://169.254.169.254/latest/meta-data/iam/security-credentials/
3. Metadata service returns: { "role-name": "waf-execution-role" }
4. Further request to that role endpoint returns temporary credentials:
   { "AccessKeyId": "ASIA...", "SecretAccessKey": "...", "Token": "..." }
5. Attacker uses credentials to enumerate and download S3 buckets
6. 700 S3 buckets' contents downloaded over six weeks

The fix that would have stopped this:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "WafLoggingOnly",
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Resource": "arn:aws:logs:us-east-1:123456789012:log-group:/aws/waf/*"
    }
  ]
}

No S3 permissions. No DescribeInstances. No AssumeRole. Just exactly what the WAF needs to write its own logs.

Identifying Overpermissioned Roles

AWS IAM Access Analyzer and Access Advisor show which permissions have been used in the last 90 days. Any permission not used in 90 days is a candidate for removal.

# List all IAM roles in the account
aws iam list-roles --query 'Roles[*].[RoleName, RoleId, Arn]' --output table
 
# View last-accessed information for a specific role
# Shows which services the role has accessed in the last 90 days
aws iam generate-service-last-accessed-details \
  --arn arn:aws:iam::123456789012:role/MyAppRole
 
# Retrieve the report (poll until Status is COMPLETED)
aws iam get-service-last-accessed-details \
  --job-id [job-id-from-above] \
  --query 'ServicesLastAccessed[?LastAuthenticated==null].[ServiceName,ServiceNamespace]' \
  --output table
 
# Simulate whether a role has specific permissions
aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:role/MyAppRole \
  --action-names s3:GetObject s3:PutObject s3:DeleteObject \
  --resource-arns arn:aws:s3:::my-sensitive-bucket/*

IAM Access Analyzer also detects external access — policies that grant cross-account or public access to resources:

# Create an analyzer for the account
aws accessanalyzer create-analyzer \
  --analyzer-name account-analyzer \
  --type ACCOUNT
 
# List findings — these are resources with external access
aws accessanalyzer list-findings \
  --analyzer-arn arn:aws:access-analyzer:us-east-1:123456789012:analyzer/account-analyzer \
  --filter '{"status": {"eq": ["ACTIVE"]}}' \
  --query 'findings[*].{Resource:resource,ResourceType:resourceType,Action:action}' \
  --output table

Metadata Service Attack Surface: IMDSv2 Enforcement

The Capital One breach exploited IMDSv1, where any request to 169.254.169.254 from the instance (including SSRF-triggered requests) would return IAM credentials. IMDSv2 requires a two-step process: first obtain a session token, then use it. SSRF attacks typically cannot follow this two-step flow.

Enforce IMDSv2 for all instances:

# Require IMDSv2 for a running instance
aws ec2 modify-instance-metadata-options \
  --instance-id i-1234567890abcdef0 \
  --http-tokens required \
  --http-endpoint enabled
 
# Require IMDSv2 on new instances via account-level default setting
aws ec2 modify-instance-metadata-defaults \
  --http-tokens required
 
# Verify enforcement on existing instances
aws ec2 describe-instances \
  --query 'Reservations[*].Instances[*].[InstanceId, MetadataOptions.HttpTokens]' \
  --output table

Root Account Lockdown

The AWS root account has unrestricted access to everything — including billing, account closure, and IAM policy management. Creating it and using it are two different things.

# Check if the root account has access keys (it should not)
aws iam get-account-summary --query 'SummaryMap.AccountAccessKeysPresent'
# If this returns 1, you have root access keys — delete them immediately
 
# Check root account MFA status
aws iam get-account-summary --query 'SummaryMap.AccountMFAEnabled'
# Should return 1 (enabled)
 
# List all IAM users without MFA (should be zero)
aws iam generate-credential-report && sleep 5
aws iam get-credential-report --query 'Content' --output text | base64 -d | \
  awk -F',' 'NR>1 && $8=="false" {print "NO MFA:", $1}'

Enforce MFA for all human users with a policy that denies all API actions except MFA device management when MFA is not active:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyAllExceptMFAManagement",
      "Effect": "Deny",
      "NotAction": [
        "iam:CreateVirtualMFADevice",
        "iam:EnableMFADevice",
        "iam:GetUser",
        "iam:ListMFADevices",
        "iam:ListVirtualMFADevices",
        "iam:ResyncMFADevice",
        "sts:GetSessionToken"
      ],
      "Resource": "*",
      "Condition": {
        "BoolIfExists": {
          "aws:MultiFactorAuthPresent": "false"
        }
      }
    }
  ]
}

Access Key Management

Long-lived access keys are the most dangerous IAM artifact. Unlike session credentials (which expire), access keys persist until manually rotated or deleted. The median time from public exposure of an AWS access key on GitHub to first unauthorized API call is 5 minutes, according to GitGuardian research.

# Find all access keys in the account and their age
aws iam list-users --query 'Users[*].UserName' --output text | tr '\t' '\n' | \
  while read username; do
    aws iam list-access-keys --user-name "$username" \
      --query "AccessKeyMetadata[*].{User:'$username',KeyId:AccessKeyId,Status:Status,Created:CreateDate}" \
      --output json
  done | jq -r '.[] | select(.Created < "'$(date -d '-90 days' --iso-8601)'") | "\(.User)\t\(.KeyId)\t\(.Status)\t\(.Created)"'
 
# Rotate an access key (create new, update application, then delete old)
# Step 1: Create new key
aws iam create-access-key --user-name service-account
 
# Step 2: Update application configuration with new key
# Step 3: Verify application works with new key
# Step 4: Deactivate old key
aws iam update-access-key \
  --user-name service-account \
  --access-key-id AKIA... \
  --status Inactive
 
# Step 5: After 24 hours without issues, delete old key
aws iam delete-access-key \
  --user-name service-account \
  --access-key-id AKIA...

For any workload running on AWS (EC2, Lambda, ECS, EKS), replace static access keys with instance/task/execution roles. The role provides short-lived credentials via the metadata service, rotated automatically every few hours.

S3 Misconfiguration: The Data Breach Factory

Publicly exposed S3 buckets have produced some of the most embarrassing data exposures in cloud history. A partial list: Accenture exposed internal credentials and client data (2017), Verizon exposed 14 million customer records (2017), Pentagon exposed NSA data (2017), UpGuard researchers found hundreds of misconfigured buckets from Fortune 500 companies throughout 2018-2022.

The common pattern: a developer creates a bucket, enables public access "temporarily" for testing, and never reverts it. Or a Terraform module defaults to public. Or someone changes a bucket policy and does not test it correctly.

Account-Level Block Public Access

AWS introduced S3 Block Public Access settings that override bucket-level settings. Enable them at the account level:

# Enable S3 Block Public Access for the entire account
# These four settings together ensure no bucket can be made public
aws s3control put-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text) \
  --public-access-block-configuration \
    BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true
 
# Verify it's applied
aws s3control get-public-access-block \
  --account-id $(aws sts get-caller-identity --query Account --output text)

These four settings:

BlockPublicAcls: Reject any request that adds a public ACL
IgnorePublicAcls: Ignore all existing public ACLs (they are not enforced)
BlockPublicPolicy: Reject bucket policies that grant public access
RestrictPublicBuckets: Restrict access to buckets with public policies to only AWS services and authorized users

Audit existing buckets for compliance:

# Find buckets that do not have block public access enabled at the bucket level
aws s3api list-buckets --query 'Buckets[*].Name' --output text | tr '\t' '\n' | \
  while read bucket; do
    result=$(aws s3api get-public-access-block --bucket "$bucket" 2>&1)
    if echo "$result" | grep -q "NoSuchPublicAccessBlockConfiguration"; then
      echo "NOT CONFIGURED: $bucket"
    else
      echo "$result" | grep -q '"BlockPublicAcls": true' && \
      echo "$result" | grep -q '"BlockPublicPolicy": true' || \
      echo "PARTIALLY CONFIGURED: $bucket"
    fi
  done

Mandatory Encryption Enforcement

Every S3 bucket handling sensitive data should enforce server-side encryption and reject unencrypted uploads. Use SSE-KMS for audit trails of key usage:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyUnencryptedObjectUploads",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-sensitive-bucket/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "aws:kms"
        }
      }
    },
    {
      "Sid": "DenyNonHTTPS",
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:*",
      "Resource": [
        "arn:aws:s3:::my-sensitive-bucket",
        "arn:aws:s3:::my-sensitive-bucket/*"
      ],
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "false"
        }
      }
    }
  ]
}

Enable default encryption at the bucket level as a fallback:

aws s3api put-bucket-encryption \
  --bucket my-sensitive-bucket \
  --server-side-encryption-configuration '{
    "Rules": [{
      "ApplyServerSideEncryptionByDefault": {
        "SSEAlgorithm": "aws:kms",
        "KMSMasterKeyID": "arn:aws:kms:us-east-1:123456789012:key/abcd1234-..."
      },
      "BucketKeyEnabled": true
    }]
  }'

Ransomware Defense: Versioning and Object Lock

An attacker or malware with write access to an S3 bucket can delete or overwrite objects. Enabling versioning ensures previous versions are retained:

# Enable versioning on a bucket
aws s3api put-bucket-versioning \
  --bucket my-critical-data \
  --versioning-configuration Status=Enabled
 
# Enable MFA delete — requires MFA to delete versions
# (Must be done by root account — this is one legitimate root account use)
aws s3api put-bucket-versioning \
  --bucket my-critical-data \
  --versioning-configuration Status=Enabled,MFADelete=Enabled \
  --mfa "arn:aws:iam::123456789012:mfa/root-account-mfa-device [TOTP-code]"

S3 Object Lock in Compliance mode makes objects immutable for a defined retention period — even an administrator cannot delete them:

# Enable Object Lock when creating a bucket (cannot be added to existing buckets)
aws s3api create-bucket \
  --bucket my-immutable-logs \
  --region us-east-1 \
  --object-lock-enabled-for-bucket
 
# Set default retention (objects cannot be deleted for 90 days after creation)
aws s3api put-object-lock-configuration \
  --bucket my-immutable-logs \
  --object-lock-configuration '{
    "ObjectLockEnabled": "Enabled",
    "Rule": {
      "DefaultRetention": {
        "Mode": "COMPLIANCE",
        "Days": 90
      }
    }
  }'

Network Security: VPCs, Security Groups, and Segmentation

Cloud network security is fundamentally different from on-premises network security. There is no physical network to defend. The "network" is software-defined, and the controls you apply determine what can talk to what — with workload-level granularity if you configure it correctly.

VPC Architecture for Security

A well-designed VPC for a typical web application:

VPC: 10.0.0.0/16

Public Subnets (10.0.1.0/24, 10.0.2.0/24 — multi-AZ)
  └── Application Load Balancer
  └── NAT Gateway

Private Application Subnets (10.0.10.0/24, 10.0.11.0/24)
  └── ECS/EKS tasks / EC2 instances
  └── No direct internet access — uses NAT Gateway for egress

Private Database Subnets (10.0.20.0/24, 10.0.21.0/24)
  └── RDS Multi-AZ
  └── ElastiCache
  └── No internet access at all — not even via NAT Gateway

Management Subnet (10.0.100.0/24)
  └── Bastion host or Systems Manager endpoints
  └── Restricted access to corporate IP ranges only

Create this with Terraform:

# VPC with flow logs enabled
resource "aws_vpc" "main" {
  cidr_block           = "10.0.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support   = true
 
  tags = { Name = "main-vpc" }
}
 
# VPC Flow Logs — critical for security visibility
resource "aws_flow_log" "main" {
  vpc_id          = aws_vpc.main.id
  traffic_type    = "ALL"
  iam_role_arn    = aws_iam_role.flow_log.arn
  log_destination = aws_cloudwatch_log_group.flow_logs.arn
}
 
# Database subnet group — no route to internet
resource "aws_subnet" "database" {
  count             = 2
  vpc_id            = aws_vpc.main.id
  cidr_block        = "10.0.${20 + count.index}.0/24"
  availability_zone = data.aws_availability_zones.available.names[count.index]
 
  # No map_public_ip_on_launch — this is a private subnet
  tags = { Name = "database-${count.index}", Tier = "database" }
}

Security Group Rules: Least Privilege at the Workload Level

Security groups are stateful firewalls attached directly to ENIs (network interfaces). Unlike on-premises firewalls that protect network segments, security groups protect individual workloads. A compromised instance can only reach what its security group allows.

# Find ALL security groups with 0.0.0.0/0 inbound rules
# These are groups that allow traffic from anywhere on the internet
aws ec2 describe-security-groups \
  --query 'SecurityGroups[?IpPermissions[?IpRanges[?CidrIp==`0.0.0.0/0`]]].{
    ID:GroupId,
    Name:GroupName,
    Description:Description,
    VPC:VpcId
  }' \
  --output table
 
# Find groups allowing unrestricted SSH (port 22) from internet
aws ec2 describe-security-groups \
  --filters "Name=ip-permission.from-port,Values=22" \
             "Name=ip-permission.to-port,Values=22" \
             "Name=ip-permission.cidr,Values=0.0.0.0/0" \
  --query 'SecurityGroups[*].[GroupId,GroupName]' \
  --output table

The correct approach for internal communication is to reference security groups instead of IP ranges:

# Application security group — accepts traffic from the ALB only
resource "aws_security_group" "app" {
  name   = "app-sg"
  vpc_id = aws_vpc.main.id
 
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]  # Only from the load balancer SG
    description     = "App traffic from ALB only"
  }
 
  egress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.database.id]  # Only to database
    description     = "Outbound to RDS"
  }
 
  egress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]  # HTTPS to internet for external API calls
    description = "HTTPS egress for external APIs"
  }
}
 
# Database security group — accepts traffic from app tier only
resource "aws_security_group" "database" {
  name   = "database-sg"
  vpc_id = aws_vpc.main.id
 
  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.app.id]  # Only from app tier
    description     = "PostgreSQL from application tier only"
  }
  # No egress rules needed — database initiates no outbound connections
}

Eliminating SSH Exposure with AWS Systems Manager

SSH on port 22 open to the internet is one of the most scanned attack surfaces on the internet. AWS Systems Manager Session Manager provides shell access to EC2 instances without any inbound network rules — it uses an outbound HTTPS connection from the instance to the SSM service.

# Connect to an instance via Session Manager — no SSH, no port 22 needed
aws ssm start-session --target i-1234567890abcdef0
 
# Port forward to access a private database without SSH tunneling
aws ssm start-session \
  --target i-1234567890abcdef0 \
  --document-name AWS-StartPortForwardingSessionToRemoteHost \
  --parameters "host=my-db.cluster-xyz.us-east-1.rds.amazonaws.com,portNumber=5432,localPortNumber=5432"

The instance needs the SSM Agent (pre-installed on most Amazon Linux and Ubuntu images) and an instance profile with AmazonSSMManagedInstanceCore policy. No security group inbound rules at all.

Secrets Management: Stopping the $0 to $100M Attack

Hardcoded credentials in source code is the most preventable and most common cloud security failure. The GitGuardian State of Secrets Sprawl report found over 6 million new secrets exposed on GitHub in 2022 alone — a figure that grows annually. Within minutes of exposure, automated scrapers find them and attempt to use them.

The standard fix is a secrets manager:

# Store a database credential in AWS Secrets Manager
aws secretsmanager create-secret \
  --name prod/myapp/database \
  --description "Production PostgreSQL credentials" \
  --secret-string '{
    "username": "app_user",
    "password": "$(openssl rand -base64 32)",
    "host": "myapp.cluster-xyz.us-east-1.rds.amazonaws.com",
    "port": 5432,
    "dbname": "myapp"
  }'
 
# Enable automatic rotation (requires a Lambda rotation function)
aws secretsmanager rotate-secret \
  --secret-id prod/myapp/database \
  --rotation-lambda-arn arn:aws:lambda:us-east-1:123456789012:function:SecretsManagerRotation \
  --rotation-rules AutomaticallyAfterDays=30
 
# Application retrieves the secret at runtime
aws secretsmanager get-secret-value \
  --secret-id prod/myapp/database \
  --query SecretString \
  --output text | python3 -c "import sys,json; s=json.load(sys.stdin); print(s['password'])"

In application code (Python with boto3):

import boto3
import json
from functools import lru_cache
 
@lru_cache(maxsize=1)
def get_db_credentials():
    """Fetch database credentials from Secrets Manager.
    Cached to avoid repeated API calls within the same process lifecycle."""
    client = boto3.client('secretsmanager', region_name='us-east-1')
    response = client.get_secret_value(SecretId='prod/myapp/database')
    return json.loads(response['SecretString'])
 
def get_db_connection():
    creds = get_db_credentials()
    return psycopg2.connect(
        host=creds['host'],
        port=creds['port'],
        database=creds['dbname'],
        user=creds['username'],
        password=creds['password']
    )

Audit existing code and containers for exposed secrets:

# Scan git repository history for secrets (not just current HEAD)
# truffleHog scans every commit
docker run --rm -v "$PWD:/pwd" trufflesecurity/trufflehog:latest \
  git file:///pwd --only-verified --json | jq '.'
 
# gitleaks — faster for large repos
gitleaks detect --source . --verbose --report-format json --report-path /tmp/leaks.json
cat /tmp/leaks.json | jq '.[] | {File: .File, Secret: .Secret, Rule: .RuleID}'
 
# Scan a Docker image for hardcoded secrets
docker run --rm -v /var/run/docker.sock:/var/run/docker.sock \
  aquasec/trivy:latest image --scanners secret my-registry/my-app:latest

Enable GitHub's secret scanning at the organization level — it scans every commit and PR in real time and blocks detected secrets from being pushed:

# Enable secret scanning for all repos in an org (requires GitHub Advanced Security)
gh api \
  --method PATCH \
  -H "Accept: application/vnd.github+json" \
  "/orgs/my-org" \
  -f secret_scanning_enabled_for_new_repositories=true

CloudTrail and Logging: The Forensic Foundation

CloudTrail records every API call made in your AWS account — who made it, from what IP, at what time, with what parameters, and whether it succeeded. Without CloudTrail, you cannot investigate incidents, detect compromised credentials, or meet most compliance requirements. It is non-negotiable.

# Enable CloudTrail for all regions with log file validation
aws cloudtrail create-trail \
  --name organization-audit-trail \
  --s3-bucket-name company-cloudtrail-logs-$(date +%Y) \
  --is-multi-region-trail \
  --enable-log-file-validation \
  --include-global-service-events \
  --is-organization-trail  # Covers all accounts in the org
 
aws cloudtrail start-logging --name organization-audit-trail
 
# Enable S3 data events (records individual object access)
# Without this, CloudTrail only records S3 bucket-level API calls
aws cloudtrail put-event-selectors \
  --trail-name organization-audit-trail \
  --event-selectors '[
    {
      "ReadWriteType": "All",
      "IncludeManagementEvents": true,
      "DataResources": [{
        "Type": "AWS::S3::Object",
        "Values": ["arn:aws:s3:::sensitive-data-bucket/"]
      }]
    }
  ]'

The CloudTrail log bucket itself must be hardened — an attacker who compromises your account will try to delete logs to cover their tracks:

# Protect the CloudTrail log bucket from deletion
# 1. Enable Object Lock on the log bucket (Compliance mode, 365-day retention)
# 2. Block all public access
# 3. Deny delete actions except from a designated security account
aws s3api put-bucket-policy --bucket company-cloudtrail-logs-2025 --policy '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyDeleteExceptSecurityAccount",
      "Effect": "Deny",
      "NotPrincipal": {
        "AWS": "arn:aws:iam::SECURITY-ACCOUNT-ID:root"
      },
      "Action": [
        "s3:DeleteObject",
        "s3:DeleteObjectVersion",
        "s3:DeleteBucket"
      ],
      "Resource": [
        "arn:aws:s3:::company-cloudtrail-logs-2025",
        "arn:aws:s3:::company-cloudtrail-logs-2025/*"
      ]
    }
  ]
}'

VPC Flow Logs for Network Visibility

VPC Flow Logs capture IP-level traffic metadata for every network interface. They do not capture payload content, but they record source IP, destination IP, ports, bytes transferred, and whether traffic was accepted or rejected by security groups.

# Enable flow logs for all interfaces in the VPC
aws ec2 create-flow-logs \
  --resource-type VPC \
  --resource-ids vpc-12345678 \
  --traffic-type ALL \
  --log-destination-type cloud-watch-logs \
  --log-destination arn:aws:logs:us-east-1:123456789012:log-group:vpc-flow-logs \
  --deliver-logs-permission-arn arn:aws:iam::123456789012:role/FlowLogsRole

Analyzing flow logs for security events:

# Find instances that had connections rejected by security groups
# REJECT status indicates blocked traffic — potential scan or attack
aws logs filter-log-events \
  --log-group-name vpc-flow-logs \
  --filter-pattern '{ $.action = "REJECT" && $.dstPort = 22 }' \
  --start-time $(($(date +%s) - 3600))000 \
  --query 'events[*].message' \
  --output text | awk '{print $4}' | sort | uniq -c | sort -rn | head -20
# Returns: count of source IPs scanning for SSH over the past hour
 
# Find unusual outbound connections (data exfiltration indicator)
# High-volume outbound to unexpected destinations
aws logs filter-log-events \
  --log-group-name vpc-flow-logs \
  --filter-pattern '{ $.direction = "egress" && $.bytes > 10000000 }' \
  --start-time $(($(date +%s) - 3600))000 \
  --query 'events[*].message' --output text

GuardDuty: Automated Threat Detection

AWS GuardDuty is a managed threat detection service that analyzes CloudTrail, VPC Flow Logs, and DNS logs for indicators of compromise. It requires no configuration of rules or thresholds — it uses machine learning trained on AWS-wide patterns to identify anomalies.

What GuardDuty detects that you would miss manually:

Credentials used from an IP address in a country they have never been used from
EC2 instances communicating with known cryptocurrency mining pools
EC2 instances communicating with known C2 infrastructure (Tor, known botnet IPs)
Unusual API calls for the account's historical pattern (the "first time" signals)
Attempts to disable CloudTrail or GuardDuty itself
DNS queries to known malicious domains from your VPC

# Enable GuardDuty in the current region
DETECTOR_ID=$(aws guardduty create-detector \
  --enable \
  --finding-publishing-frequency FIFTEEN_MINUTES \
  --query 'DetectorId' --output text)
 
# Optionally enable S3 protection and EKS protection
aws guardduty update-detector \
  --detector-id $DETECTOR_ID \
  --features '[
    {"Name": "S3_DATA_EVENTS", "Status": "ENABLED"},
    {"Name": "EKS_AUDIT_LOGS", "Status": "ENABLED"},
    {"Name": "EBS_MALWARE_PROTECTION", "Status": "ENABLED"}
  ]'
 
# List current HIGH and CRITICAL findings
aws guardduty list-findings \
  --detector-id $DETECTOR_ID \
  --finding-criteria '{
    "Criterion": {
      "severity": {"Gte": 7}
    }
  }' \
  --query 'FindingIds' --output text | tr '\t' '\n' | \
  xargs -I{} aws guardduty get-findings \
    --detector-id $DETECTOR_ID \
    --finding-ids {} \
    --query 'Findings[*].{Type:Type,Severity:Severity,Resource:Resource.ResourceType}' \
    --output table

CSPM: Continuous Misconfiguration Detection

Cloud Security Posture Management (CSPM) tools continuously scan your cloud environment against security benchmarks and flag misconfigurations. The median time from a public S3 bucket appearing to it being indexed by a data harvesting bot is approximately 1-4 hours. CSPM tools catch misconfigurations before they become breaches.

AWS Security Hub

Security Hub aggregates findings from GuardDuty, Inspector, Macie, IAM Access Analyzer, and evaluates your account against the CIS AWS Foundations Benchmark:

# Enable Security Hub with default standards
aws securityhub enable-security-hub \
  --enable-default-standards \
  --tags '{"Environment": "production"}'
 
# Check which standards are enabled
aws securityhub describe-standards-subscriptions \
  --query 'StandardsSubscriptions[*].[StandardsArn,StandardsStatus]' \
  --output table
 
# Get CRITICAL severity findings
aws securityhub get-findings \
  --filters '{
    "SeverityLabel": [{"Value": "CRITICAL", "Comparison": "EQUALS"}],
    "RecordState": [{"Value": "ACTIVE", "Comparison": "EQUALS"}],
    "WorkflowStatus": [{"Value": "NEW", "Comparison": "EQUALS"}]
  }' \
  --query 'Findings[*].{Title:Title,Resource:Resources[0].Id,Remediation:Remediation.Recommendation.Text}' \
  --output table

Prowler: Open-Source CIS Benchmarking

Prowler runs over 300 security checks against AWS, Azure, and GCP, aligned to CIS benchmarks, NIST, and GDPR:

# Install Prowler
pip install prowler
 
# Run CIS Level 2 checks against AWS
prowler aws \
  --region us-east-1 \
  --compliance cis_level2_aws_account_v3.0.0 \
  --output-formats json html \
  --output-directory /tmp/prowler-results
 
# Run checks for a specific service (IAM in this case)
prowler aws --service iam --region us-east-1
 
# Run against Azure
prowler azure --compliance cis_azure_foundations_v2.0.0
 
# Check for specific high-value findings
prowler aws --check s3_bucket_public_access \
            --check iam_root_credentials_management \
            --check ec2_imdsv2_enabled \
            --region us-east-1 --output-formats json

Terraform Security Scanning

If you provision infrastructure with Terraform, scan your configs before terraform apply:

# checkov — scans IaC for misconfigurations
pip install checkov
checkov -d . --framework terraform --output cli --compact
 
# tfsec — security scanner specifically for Terraform
docker run --rm -v "$(pwd):/src" aquasec/tfsec /src
 
# trivy — IaC scanning + vulnerability scanning
trivy config . --severity HIGH,CRITICAL

Container Security

Containers add an attack surface that does not exist in traditional VM deployments. Containers share the host kernel — a container escape gives an attacker access to the host. And the container image itself may contain vulnerabilities at the OS or application dependency layer.

Image Vulnerability Scanning

# Scan an image with Trivy — the most comprehensive free scanner
trivy image \
  --severity CRITICAL,HIGH \
  --exit-code 1 \         # Non-zero exit on findings (blocks CI/CD pipeline)
  --format table \
  my-registry/my-app:latest
 
# Get detailed JSON output for SIEM ingestion
trivy image --format json --output /tmp/scan-results.json my-registry/my-app:latest
 
# Scan images in AWS ECR
aws ecr start-image-scan \
  --repository-name my-app \
  --image-id imageTag=latest
 
# Wait for scan to complete and retrieve findings
aws ecr describe-image-scan-findings \
  --repository-name my-app \
  --image-id imageTag=latest \
  --query 'imageScanFindings.findingSeverityCounts'
 
# Configure ECR to scan every image on push
aws ecr put-image-scanning-configuration \
  --repository-name my-app \
  --image-scanning-configuration scanOnPush=true

Kubernetes Pod Security

Enforce security policies at the Kubernetes admission layer:

# Pod Security Standards — enforce across a namespace
# Add this to the namespace definition
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/enforce-version: v1.28
    pod-security.kubernetes.io/warn: restricted
    pod-security.kubernetes.io/audit: restricted

The restricted Pod Security Standard enforces:

No privileged containers
No host namespaces (hostNetwork, hostPID, hostIPC)
No HostPath volumes
Containers must not run as root
Must use a non-root user and group
Read-only root filesystem (recommended but not enforced in restricted)
Drop all capabilities, add back only what is needed
Seccomp profile required

For workloads that need specific capabilities, use Kyverno policies for fine-grained control:

# Kyverno policy: Require read-only root filesystem for all containers
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-readonly-rootfs
spec:
  validationFailureAction: Enforce
  rules:
  - name: check-readonly-rootfs
    match:
      any:
      - resources:
          kinds: [Pod]
          namespaces: [production, staging]
    validate:
      message: "Root filesystem must be read-only."
      pattern:
        spec:
          containers:
          - securityContext:
              readOnlyRootFilesystem: true

IAM Roles for Service Accounts (IRSA) in EKS

Do not rely on the node group's instance profile for pod-level AWS permissions. IRSA gives each Kubernetes service account its own IAM role with minimal permissions:

# Create IRSA for an application that needs S3 read access
eksctl create iamserviceaccount \
  --cluster production-cluster \
  --namespace my-app \
  --name my-app-sa \
  --role-name my-app-s3-reader \
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
  --approve
 
# Verify OIDC federation is configured
aws iam list-open-id-connect-providers \
  --query 'OpenIDConnectProviderList[*].Arn'

Incident Response Runbook Template

When a GuardDuty finding fires or a CloudTrail alert triggers, having a documented response procedure reduces the time to containment from hours to minutes.

# INCIDENT RESPONSE: Compromised IAM Credential
# Step 1: Immediately revoke the compromised credential
aws iam update-access-key \
  --access-key-id AKIA[COMPROMISED-KEY] \
  --status Inactive
 
# If the compromised principal is an IAM user, attach a deny-all policy immediately
aws iam put-user-policy \
  --user-name compromised-user \
  --policy-name EmergencyLockout \
  --policy-document '{"Version":"2012-10-17","Statement":[{"Effect":"Deny","Action":"*","Resource":"*"}]}'
 
# Step 2: Identify what the credential accessed
# Search CloudTrail for API calls made by this access key in the last 24 hours
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIA[COMPROMISED-KEY] \
  --start-time "$(date -d '24 hours ago' -u +%Y-%m-%dT%H:%M:%SZ)" \
  --query 'Events[*].{Time:EventTime,Event:EventName,IP:CloudTrailEvent}' \
  --output table
 
# Step 3: Check for new IAM users or roles created by the compromised credential
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIA[COMPROMISED-KEY] \
  --lookup-attributes AttributeKey=EventName,AttributeValue=CreateUser \
  --start-time "$(date -d '24 hours ago' -u +%Y-%m-%dT%H:%M:%SZ)"
 
# Step 4: Check for new S3 objects or deleted logs
aws cloudtrail lookup-events \
  --lookup-attributes AttributeKey=AccessKeyId,AttributeValue=AKIA[COMPROMISED-KEY] \
  --lookup-attributes AttributeKey=EventName,AttributeValue=DeleteTrail
 
# Step 5: Verify no unauthorized EC2 instances were launched
aws ec2 describe-instances \
  --filters "Name=launch-time,Values=$(date -d '24 hours ago' +%Y-%m-%d)*" \
  --query 'Reservations[*].Instances[*].{ID:InstanceId,Type:InstanceType,State:State.Name,Launched:LaunchTime}' \
  --output table

Hardening Checklist

Work through this list to establish a strong baseline posture. Each item addresses a known attack pattern with documented real-world incidents.

IAM

[ ] Root account has no access keys
[ ] Root account has hardware MFA enabled
[ ] All human users use IAM Identity Center (SSO) instead of IAM users where possible
[ ] MFA required for all IAM users via policy condition
[ ] All EC2, Lambda, ECS, EKS workloads use roles, not static access keys
[ ] IMDSv2 enforced on all EC2 instances
[ ] IAM Access Analyzer enabled and findings reviewed
[ ] Access keys older than 90 days rotated or replaced with roles

[ ] Block Public Access enabled at account level
[ ] All buckets enforce SSE-KMS encryption via bucket policy
[ ] All buckets deny HTTP access via bucket policy
[ ] CloudTrail log bucket has Object Lock in Compliance mode
[ ] Versioning enabled on all buckets with important data

Network

[ ] No security groups allow SSH (22) or RDP (3389) from 0.0.0.0/0
[ ] No database ports (5432, 3306, 27017, 6379) reachable from internet
[ ] VPC Flow Logs enabled for all VPCs
[ ] EC2 instances in private subnets accessed via SSM Session Manager

Detection

[ ] CloudTrail multi-region trail enabled with log validation
[ ] GuardDuty enabled in all regions with S3 and EKS protection
[ ] Security Hub enabled with CIS Foundations Benchmark standard
[ ] CloudTrail logs forwarded to SIEM or central logging
[ ] Alerts configured for root account usage, IAM policy changes, CloudTrail disabling

Secrets

[ ] No credentials in source code (verified via truffleHog/gitleaks)
[ ] No credentials in container images (verified via Trivy)
[ ] All secrets stored in Secrets Manager or Parameter Store
[ ] Automated rotation configured for database credentials

Containers (if applicable)

[ ] All images scanned on push with ECR or Trivy
[ ] Pod Security Standards set to restricted for production namespaces
[ ] IRSA configured — no overpermissioned node instance profiles
[ ] No privileged containers in production workloads

The organizations that handle cloud security well treat it as ongoing operational work — automated scanning with Prowler running weekly, regular access reviews, tested incident response runbooks, and infrastructure-as-code with security scanning in CI/CD. The ones that get breached treat it as a one-time configuration exercise. The difference between those two approaches is the entire threat landscape.