Docker · Kubernetes · Grafana · ArgoCD · AWS

🗺️ AWS Services Map

AWS — Key Services for DevOps Compute Containers Storage Database ────────── ─────────── ────────── ────────── EC2 ECR (registry) S3 (objects) RDS (SQL) Lambda ECS (tasks) EFS (NFS) DynamoDB (NoSQL) Lightsail EKS (k8s) EBS (block) ElastiCache (Redis) Auto Scaling Fargate Glacier (archive) Aurora Networking CDN / DNS Security DevOps ────────── ────────── ────────── ────────── VPC CloudFront IAM CodePipeline Subnets Route 53 Secrets Manager CodeBuild Security Groups ACM (certs) KMS (keys) CodeDeploy ALB / NLB WAF (firewall) GuardDuty CloudWatch

🔐 IAM — Identity & Access Management

IAM is the foundation of AWS security. It controls who (users, roles, services) can do what (actions) on which resources.

IAM Hierarchy User → real person, has access keys or console password Group → collection of users sharing policies (Developers, Ops) Role → assumed by AWS services — EC2, ECS, Lambda, EKS pods Policy → JSON rules: Allow/Deny + Action + Resource
# Create user
aws iam create-user --user-name alice

# Create access keys for a user
aws iam create-access-key --user-name alice

# Create a role for EC2 instances
aws iam create-role \
  --role-name ec2-app-role \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [{
      "Effect": "Allow",
      "Principal": {"Service": "ec2.amazonaws.com"},
      "Action": "sts:AssumeRole"
    }]
  }'

# Attach a managed policy
aws iam attach-role-policy \
  --role-name ec2-app-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

# Create inline policy from file
aws iam put-role-policy \
  --role-name ec2-app-role \
  --policy-name AppPolicy \
  --policy-document file://policy.json

# Check who you are logged in as
aws sts get-caller-identity

# Assume a role temporarily
aws sts assume-role \
  --role-arn arn:aws:iam::123:role/DeployRole \
  --role-session-name deploy-session
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "S3BucketAccess",
      "Effect": "Allow",
      "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
      "Resource": "arn:aws:s3:::my-app-bucket/*"
    },
    {
      "Sid": "S3ListBucket",
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": "arn:aws:s3:::my-app-bucket"
    },
    {
      "Sid": "SecretsRead",
      "Effect": "Allow",
      "Action": ["secretsmanager:GetSecretValue"],
      "Resource": "arn:aws:secretsmanager:us-east-1:123:secret:myapp/*"
    }
  ]
}

🌐 VPC — Virtual Private Cloud

A VPC is your private network inside AWS. Everything runs inside a VPC. You control IP ranges, subnets, routing, and firewalls.

VPC Architecture (standard 3-tier) Region: us-east-1 ┌──────────────────────────────────────────────────────────────┐ │ VPC: 10.0.0.0/16 │ │ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Public Subnet │ │ Public Subnet │ ← AZ-a, AZ-b│ │ │ 10.0.1.0/24 │ │ 10.0.2.0/24 │ │ │ │ [ALB] [NAT GW] │ │ [ALB] [NAT GW] │ │ │ └────────┬─────────┘ └────────┬──────────┘ │ │ │ │ │ │ ┌────────▼─────────┐ ┌────────▼──────────┐ │ │ │ Private Subnet │ │ Private Subnet │ ← app layer │ │ │ 10.0.10.0/24 │ │ 10.0.11.0/24 │ │ │ │ [ECS/EKS pods] │ │ [ECS/EKS pods] │ │ │ └────────┬─────────┘ └────────┬──────────┘ │ │ │ │ │ │ ┌────────▼─────────┐ ┌────────▼──────────┐ │ │ │ Data Subnet │ │ Data Subnet │ ← DB layer │ │ │ 10.0.20.0/24 │ │ 10.0.21.0/24 │ │ │ │ [RDS] [Redis] │ │ [RDS] [Redis] │ │ │ └──────────────────┘ └───────────────────┘ │ │ │ │ Internet Gateway → Public Subnets only │ │ NAT Gateway → Private Subnets can reach internet (outbound)│ └──────────────────────────────────────────────────────────────┘
# Create VPC
aws ec2 create-vpc --cidr-block 10.0.0.0/16 --region us-east-1

# Create subnets (one public, one private per AZ)
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.1.0/24 --availability-zone us-east-1a
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.10.0/24 --availability-zone us-east-1a

# Create and attach Internet Gateway (for public subnets)
aws ec2 create-internet-gateway
aws ec2 attach-internet-gateway --vpc-id vpc-xxx --internet-gateway-id igw-xxx

# Create NAT Gateway (private subnets need this for outbound internet)
aws ec2 allocate-address --domain vpc                         # get Elastic IP
aws ec2 create-nat-gateway --subnet-id subnet-public --allocation-id eipalloc-xxx

# Security Groups (stateful firewall per resource)
aws ec2 create-security-group \
  --group-name web-sg \
  --description "Web tier" \
  --vpc-id vpc-xxx

# Allow HTTPS inbound from anywhere
aws ec2 authorize-security-group-ingress \
  --group-id sg-xxx \
  --protocol tcp --port 443 --cidr 0.0.0.0/0

# Allow inbound from another security group
aws ec2 authorize-security-group-ingress \
  --group-id sg-db \
  --protocol tcp --port 5432 \
  --source-group sg-app
ConceptWhat It IsKey Rule
VPCYour private AWS networkOne per environment (dev/prod)
Public SubnetHas route to Internet GatewayPut ALBs and NAT Gateways here
Private SubnetNo direct internet routePut app containers and databases here
Security GroupStateful firewall per resourceDefault: deny all inbound, allow all outbound
NACLStateless firewall per subnetUse for broad subnet-level rules
Internet GatewayVPC → InternetOne per VPC
NAT GatewayPrivate subnet → outbound internetOne per AZ (for HA)

🪣 S3 — Object Storage & Static Websites

S3 stores any file (objects) in buckets. It's also the cheapest and simplest way to host static websites.

# Create a bucket
aws s3 mb s3://my-company-assets --region us-east-1

# Upload files
aws s3 cp ./dist s3://my-company-assets/ --recursive
aws s3 sync ./dist s3://my-company-assets/ --delete   # sync + remove old files

# Download a file
aws s3 cp s3://my-company-assets/app.js ./app.js

# List bucket contents
aws s3 ls s3://my-company-assets/
aws s3 ls s3://my-company-assets/ --recursive --human-readable

# Make a file public
aws s3api put-object-acl \
  --bucket my-company-assets \
  --key index.html \
  --acl public-read

# Enable static website hosting
aws s3 website s3://my-site/ \
  --index-document index.html \
  --error-document 404.html

# Set bucket policy (allow public read)
aws s3api put-bucket-policy \
  --bucket my-site \
  --policy file://bucket-policy.json

# Presigned URL — share a private file for 1 hour
aws s3 presign s3://my-bucket/private.pdf --expires-in 3600
{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "PublicReadGetObject",
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::my-site/*"
  }]
}

⚡ CloudFront — CDN

CloudFront caches your content at 400+ edge locations worldwide. Users load assets from the nearest location — dramatically faster than fetching from one region. Also handles HTTPS termination.

CloudFront Architecture User (Tokyo) ────► Edge Location (Tokyo) ─── cache hit? ──► return file │ │ cache miss (first request) ▼ Origin: S3 bucket / ALB / EC2 (us-east-1) Result: ~5ms latency from edge vs ~200ms from origin
# Create distribution (S3 origin)
aws cloudfront create-distribution \
  --distribution-config file://cloudfront-config.json

# Invalidate cached files (force refresh after deploy)
aws cloudfront create-invalidation \
  --distribution-id EDFDVBD6EXAMPLE \
  --paths "/*"                          # all files

aws cloudfront create-invalidation \
  --distribution-id EDFDVBD6EXAMPLE \
  --paths "/index.html" "/app.js"       # specific files

# Get distribution info
aws cloudfront get-distribution --id EDFDVBD6EXAMPLE
{
  "Origins": {
    "Quantity": 1,
    "Items": [{
      "Id": "S3-my-site",
      "DomainName": "my-site.s3.amazonaws.com",
      "S3OriginConfig": { "OriginAccessIdentity": "" }
    }]
  },
  "DefaultCacheBehavior": {
    "ViewerProtocolPolicy": "redirect-to-https",
    "CachePolicyId": "658327ea-f89d-4fab-a63d-7e88639e58f6",
    "Compress": true
  },
  "DefaultRootObject": "index.html",
  "HttpVersion": "http2",
  "Enabled": true
}

🌍 Route 53 — DNS

Route 53 is AWS's managed DNS service. Register domains, create DNS records, route traffic with health checks, and do geo-based routing.

# List hosted zones
aws route53 list-hosted-zones

# Create DNS record (A record → ALB)
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1234567890 \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "app.example.com",
        "Type": "A",
        "AliasTarget": {
          "HostedZoneId": "Z35SXDOTRQ7X7K",
          "DNSName": "my-alb-1234.us-east-1.elb.amazonaws.com",
          "EvaluateTargetHealth": true
        }
      }
    }]
  }'

# Create CNAME record → CloudFront
aws route53 change-resource-record-sets \
  --hosted-zone-id Z1234567890 \
  --change-batch '{
    "Changes": [{
      "Action": "UPSERT",
      "ResourceRecordSet": {
        "Name": "www.example.com",
        "Type": "CNAME",
        "TTL": 300,
        "ResourceRecords": [{"Value": "d1234.cloudfront.net"}]
      }
    }]
  }'
Record TypePoints ToUse Case
AIP address or AWS aliasRoot domain, ALB, CloudFront
CNAMEAnother hostnamewww → root, subdomains
ALIASAWS resource (ALB, CloudFront, S3)Like CNAME but works at apex + free queries
MXMail serversEmail routing (Google Workspace etc.)
TXTText stringDomain verification, SPF, DKIM

🔒 ACM — SSL/TLS Certificates

AWS Certificate Manager provides free SSL certs that auto-renew. Use with ALB and CloudFront for HTTPS.

# Request a certificate (DNS validation is easiest)
aws acm request-certificate \
  --domain-name example.com \
  --subject-alternative-names "*.example.com" \
  --validation-method DNS \
  --region us-east-1                 # CloudFront requires us-east-1

# List certificates
aws acm list-certificates --region us-east-1

# Get the DNS validation records (add to Route 53)
aws acm describe-certificate \
  --certificate-arn arn:aws:acm:us-east-1:123:certificate/abc \
  --query 'Certificate.DomainValidationOptions'
💡
ACM certificates are free and auto-renew

ACM certs used with ALB or CloudFront are completely free. They auto-renew 60 days before expiry — no more Let's Encrypt cron jobs. For CloudFront, always request the cert in us-east-1 regardless of where your origin is.

🗄️ RDS — Managed Databases

RDS manages the database server for you — backups, patching, failover, replicas. Supports PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, and Aurora.

# Create a PostgreSQL RDS instance
aws rds create-db-instance \
  --db-instance-identifier myapp-db \
  --db-instance-class db.t3.micro \
  --engine postgres \
  --engine-version "15.4" \
  --master-username appuser \
  --master-user-password "SecurePass@123" \
  --db-name myappdb \
  --allocated-storage 20 \
  --storage-type gp3 \
  --vpc-security-group-ids sg-xxx \
  --db-subnet-group-name my-db-subnet-group \
  --multi-az \                          # standby in another AZ (failover)
  --backup-retention-period 7 \         # 7-day automated backups
  --deletion-protection \               # can't accidentally delete
  --no-publicly-accessible              # only accessible inside VPC

# Create a read replica
aws rds create-db-instance-read-replica \
  --db-instance-identifier myapp-db-replica \
  --source-db-instance-identifier myapp-db

# Take a manual snapshot
aws rds create-db-snapshot \
  --db-instance-identifier myapp-db \
  --db-snapshot-identifier myapp-db-snap-$(date +%Y%m%d)

# Restore from snapshot
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier myapp-db-restored \
  --db-snapshot-identifier myapp-db-snap-20241215

# List instances
aws rds describe-db-instances \
  --query 'DBInstances[*].[DBInstanceIdentifier,DBInstanceStatus,Endpoint.Address]'
Instance ClassvCPURAMUse For
db.t3.micro21 GBDev / testing
db.t3.medium24 GBSmall production
db.r6g.large216 GBMemory-heavy workloads
db.r6g.4xlarge16128 GBLarge production DB

⚡ ElastiCache — Redis / Memcached

Managed in-memory cache. Use Redis for sessions, rate limiting, pub/sub, and caching. Typically ~10x faster than RDS for cacheable data.

# Create a Redis cluster
aws elasticache create-replication-group \
  --replication-group-id myapp-redis \
  --replication-group-description "App cache" \
  --engine redis \
  --engine-version "7.0" \
  --cache-node-type cache.t3.micro \
  --num-cache-clusters 2 \             # primary + 1 replica
  --cache-subnet-group-name my-cache-subnet \
  --security-group-ids sg-redis \
  --at-rest-encryption-enabled \
  --transit-encryption-enabled

# Connect (from inside VPC)
redis-cli -h myapp-redis.xxxxx.ng.0001.use1.cache.amazonaws.com -p 6379

# Describe clusters
aws elasticache describe-replication-groups \
  --replication-group-id myapp-redis \
  --query 'ReplicationGroups[*].NodeGroups[*].PrimaryEndpoint'

🔑 Secrets Manager & Parameter Store

# ── Secrets Manager (for sensitive values) ─────────────
# Store a secret
aws secretsmanager create-secret \
  --name /myapp/production/db-password \
  --secret-string "SuperSecret@123"

# Store JSON secret (multiple values)
aws secretsmanager create-secret \
  --name /myapp/production/db \
  --secret-string '{"host":"db.xxx.us-east-1.rds.amazonaws.com","port":"5432","user":"app","password":"Secret@1"}'

# Read a secret
aws secretsmanager get-secret-value \
  --secret-id /myapp/production/db \
  --query SecretString --output text | jq .

# Rotate secret
aws secretsmanager rotate-secret \
  --secret-id /myapp/production/db-password

# ── Parameter Store (for config + non-sensitive values) ──
# Store a parameter
aws ssm put-parameter \
  --name /myapp/production/log-level \
  --value "info" \
  --type String

# Store encrypted parameter (SecureString)
aws ssm put-parameter \
  --name /myapp/production/api-key \
  --value "sk-live-xxxxx" \
  --type SecureString

# Get parameter
aws ssm get-parameter \
  --name /myapp/production/api-key \
  --with-decryption \
  --query Parameter.Value --output text

# Get all parameters under a path
aws ssm get-parameters-by-path \
  --path /myapp/production/ \
  --with-decryption

🗄️ ECR — Container Registry

# Authenticate Docker to ECR
aws ecr get-login-password --region us-east-1 \
  | docker login --username AWS \
    --password-stdin 123456789012.dkr.ecr.us-east-1.amazonaws.com

# Create repository
aws ecr create-repository \
  --repository-name myapp \
  --image-scanning-configuration scanOnPush=true \
  --encryption-configuration encryptionType=AES256

# Build, tag, push
IMAGE=123456789012.dkr.ecr.us-east-1.amazonaws.com/myapp
docker build -t $IMAGE:latest -t $IMAGE:$(git rev-parse --short HEAD) .
docker push $IMAGE --all-tags

# Set lifecycle policy (auto-expire old images)
aws ecr put-lifecycle-policy \
  --repository-name myapp \
  --lifecycle-policy-text file://ecr-lifecycle.json

🚢 ECS — Elastic Container Service

# Create cluster
aws ecs create-cluster \
  --cluster-name production \
  --capacity-providers FARGATE FARGATE_SPOT

# Register task definition
aws ecs register-task-definition --cli-input-json file://task-def.json

# Create service
aws ecs create-service \
  --cluster production \
  --service-name myapp \
  --task-definition myapp:1 \
  --desired-count 3 \
  --launch-type FARGATE \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:...,containerName=myapp,containerPort=8080" \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-a,subnet-b],securityGroups=[sg-app],assignPublicIp=DISABLED}"

# Deploy new image version
aws ecs update-service \
  --cluster production \
  --service myapp \
  --force-new-deployment

# Wait for service to stabilize
aws ecs wait services-stable --cluster production --services myapp

# Shell into running Fargate task
TASK=$(aws ecs list-tasks --cluster production --service-name myapp --query 'taskArns[0]' --output text)
aws ecs execute-command \
  --cluster production \
  --task $TASK \
  --container myapp \
  --interactive --command "/bin/sh"

☸ EKS — Elastic Kubernetes Service

# Install eksctl
curl -sLO "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz"
tar xz -C /tmp -f *.tar.gz && sudo mv /tmp/eksctl /usr/local/bin/

# Create cluster with managed node group
eksctl create cluster \
  --name production \
  --region us-east-1 \
  --nodegroup-name standard-workers \
  --node-type t3.medium \
  --nodes 3 --nodes-min 2 --nodes-max 10 \
  --managed \
  --with-oidc \                       # enables IRSA
  --ssh-access --ssh-public-key my-key

# Connect kubectl
aws eks update-kubeconfig --region us-east-1 --name production

# Add Fargate profile (serverless pods)
eksctl create fargateprofile \
  --cluster production \
  --name fp-myapp \
  --namespace myapp

# Enable IAM role for a service account (IRSA)
eksctl create iamserviceaccount \
  --name myapp-sa --namespace myapp \
  --cluster production \
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
  --approve

# Install ALB Ingress Controller
helm repo add eks https://aws.github.io/eks-charts
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=production \
  --set serviceAccount.name=aws-load-balancer-controller

📊 CloudWatch — Monitoring & Logs

# View logs from ECS container
aws logs tail /ecs/myapp --follow --format short

# Get logs from a specific time range
aws logs get-log-events \
  --log-group-name /ecs/myapp \
  --log-stream-name ecs/myapp/task-id \
  --start-time $(date -d '1 hour ago' +%s000)

# Create a CloudWatch alarm
aws cloudwatch put-metric-alarm \
  --alarm-name high-cpu \
  --metric-name CPUUtilization \
  --namespace AWS/ECS \
  --dimensions Name=ClusterName,Value=production Name=ServiceName,Value=myapp \
  --statistic Average \
  --period 60 \
  --evaluation-periods 3 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --alarm-actions arn:aws:sns:us-east-1:123:alert-topic

# Get a metric
aws cloudwatch get-metric-statistics \
  --namespace AWS/ECS \
  --metric-name CPUUtilization \
  --dimensions Name=ServiceName,Value=myapp Name=ClusterName,Value=production \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 300 --statistics Average

🔁 Full CI/CD Pipeline on AWS

AWS Production Pipeline Developer → git push → GitHub │ ▼ trigger GitHub Actions 1. Run tests 2. docker build 3. docker push → ECR 4. Update image tag in infra git repo │ ▼ ArgoCD detects change EKS Cluster Rolling update: new pods → health checks → old pods removed │ ▼ ALB routes HTTPS traffic Route 53: app.example.com → ALB CloudFront: static.example.com → S3 │ Users reach the app

📌 AWS CLI Cheat Sheet

TaskCommand
Who am Iaws sts get-caller-identity
Switch profileexport AWS_PROFILE=myprofile
ECR loginaws ecr get-login-password | docker login --username AWS --password-stdin ACCOUNT.dkr.ecr.REGION.amazonaws.com
List S3 bucketsaws s3 ls
Sync to S3aws s3 sync ./dist s3://bucket/ --delete
Invalidate CloudFrontaws cloudfront create-invalidation --distribution-id ID --paths "/*"
List ECS clustersaws ecs list-clusters
Deploy ECSaws ecs update-service --cluster C --service S --force-new-deployment
Connect to EKSaws eks update-kubeconfig --name CLUSTER --region REGION
Stream logsaws logs tail /ecs/myapp --follow
Get secretaws secretsmanager get-secret-value --secret-id /app/db --query SecretString --output text
Get parameteraws ssm get-parameter --name /app/key --with-decryption
Describe RDSaws rds describe-db-instances --query 'DBInstances[*].Endpoint.Address'