Docker Compose Production Setup Guide (2026)

Docker Compose is not just for local development. With the right configuration — health checks, resource limits, restart policies, and proper networking — Compose runs production workloads reliably on a single server or small cluster. It is the simplest path from "works on my machine" to "works in production" for teams that do not need Kubernetes-level orchestration.

This guide covers the configuration patterns that separate a development Compose file from a production-ready one, with complete YAML examples you can adapt.

Production vs Development Configuration

The first rule: never use your development docker-compose.yml in production. Development configs mount source code, expose debug ports, run in development mode, and skip security hardening.

Use Compose override files to layer production configuration on top of shared base config.

docker-compose.yml              # Shared base configuration
docker-compose.override.yml     # Development overrides (auto-loaded locally)
docker-compose.prod.yml         # Production overrides

# Development (auto-loads override)
docker compose up

# Production (explicit file selection)
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

A Complete Multi-Service Stack

Here is a production-ready stack with a Next.js frontend, a Node.js API, PostgreSQL, Redis, and Nginx as a reverse proxy.

Base Configuration

# docker-compose.yml
name: myapp

services:
  web:
    build:
      context: ./apps/web
      dockerfile: Dockerfile
    depends_on:
      api:
        condition: service_healthy
    environment:
      - NEXT_PUBLIC_API_URL=http://api:4000

  api:
    build:
      context: ./apps/api
      dockerfile: Dockerfile
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgresql://app:${DB_PASSWORD}@postgres:5432/myapp
      - REDIS_URL=redis://redis:6379

  postgres:
    image: postgres:16-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init-db:/docker-entrypoint-initdb.d
    environment:
      - POSTGRES_DB=myapp
      - POSTGRES_USER=app
      - POSTGRES_PASSWORD=${DB_PASSWORD}

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}

volumes:
  postgres_data:
  redis_data:

Production Overrides

# docker-compose.prod.yml
services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
      - nginx_cache:/var/cache/nginx
    depends_on:
      web:
        condition: service_healthy
    restart: always
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 256M

  web:
    restart: always
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 256M
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  api:
    restart: always
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 1G
        reservations:
          cpus: "0.5"
          memory: 512M
      replicas: 2
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:4000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s

  postgres:
    restart: always
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 2G
        reservations:
          cpus: "1.0"
          memory: 1G
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d myapp"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    command:
      - "postgres"
      - "-c"
      - "shared_buffers=512MB"
      - "-c"
      - "effective_cache_size=1536MB"
      - "-c"
      - "maintenance_work_mem=128MB"
      - "-c"
      - "random_page_cost=1.1"
      - "-c"
      - "effective_io_concurrency=200"
      - "-c"
      - "max_connections=100"
      - "-c"
      - "log_min_duration_statement=200"

  redis:
    restart: always
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 256M
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  nginx_cache:

Health Checks

Health checks are the most important production configuration. Without them, Compose cannot determine whether a service is actually ready, depends_on conditions do not wait for readiness, and failed services are not restarted.

Health Check Patterns

services:
  # HTTP health check
  web:
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  # TCP health check
  api:
    healthcheck:
      test: ["CMD-SHELL", "nc -z localhost 4000 || exit 1"]
      interval: 15s
      timeout: 5s
      retries: 3

  # Database health check
  postgres:
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  # Redis health check
  redis:
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

Health Check Parameters

Parameter	Purpose	Recommended
`interval`	Time between checks	10–30s
`timeout`	Max time per check	5–10s
`retries`	Failures before unhealthy	3–5
`start_period`	Grace period on startup	App-dependent (20–60s)

Application Health Endpoints

Your health check endpoint should verify that the application and its dependencies are functional.

// apps/api/src/routes/health.ts
import { Router } from "express";
import { db } from "../lib/db";
import { redis } from "../lib/redis";

const router = Router();

router.get("/health", async (req, res) => {
  const checks: Record<string, string> = {};

  try {
    await db.$queryRaw`SELECT 1`;
    checks.database = "ok";
  } catch {
    checks.database = "error";
  }

  try {
    await redis.ping();
    checks.redis = "ok";
  } catch {
    checks.redis = "error";
  }

  const isHealthy = Object.values(checks).every((v) => v === "ok");

  res.status(isHealthy ? 200 : 503).json({
    status: isHealthy ? "healthy" : "degraded",
    checks,
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  });
});

export { router as healthRouter };

Resource Limits

Without resource limits, a single misbehaving container can consume all available memory and crash the host, taking every other service with it.

services:
  api:
    deploy:
      resources:
        limits:
          cpus: "1.0"     # Max 1 CPU core
          memory: 1G      # Max 1GB RAM (OOM-killed if exceeded)
        reservations:
          cpus: "0.5"     # Guaranteed 0.5 CPU cores
          memory: 512M    # Guaranteed 512MB RAM

Resource Planning

Service Type	CPU Limit	Memory Limit	Notes
Next.js frontend	0.5–1.0	512M–1G	Memory depends on page count
Node.js API	0.5–2.0	512M–2G	Depends on concurrency
PostgreSQL	1.0–4.0	1G–4G	Memory = cache performance
Redis	0.25–0.5	256M–1G	Size of cached dataset
Nginx	0.25–0.5	128M–256M	Minimal resource needs

Monitor actual usage with docker stats before setting final limits. Over-constraining wastes resources; under-constraining risks OOM kills.

# Monitor real-time resource usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"

Networking

Docker Compose creates a default network for each project. Services communicate by name. For production, define explicit networks to control traffic flow.

services:
  nginx:
    networks:
      - frontend

  web:
    networks:
      - frontend
      - backend

  api:
    networks:
      - backend
      - data

  postgres:
    networks:
      - data

  redis:
    networks:
      - data

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
  data:
    driver: bridge
    internal: true  # No external access

The internal: true flag on the data network means PostgreSQL and Redis are only accessible from services on that network — not from the host or the internet.

Nginx Reverse Proxy Configuration

# nginx/nginx.conf
worker_processes auto;

events {
    worker_connections 1024;
}

http {
    upstream web_backend {
        server web:3000;
    }

    upstream api_backend {
        server api:4000;
    }

    server {
        listen 80;
        server_name example.com;
        return 301 https://$host$request_uri;
    }

    server {
        listen 443 ssl http2;
        server_name example.com;

        ssl_certificate /etc/nginx/ssl/cert.pem;
        ssl_certificate_key /etc/nginx/ssl/key.pem;
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers HIGH:!aNULL:!MD5;

        gzip on;
        gzip_types text/plain text/css application/json application/javascript;

        location / {
            proxy_pass http://web_backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }

        location /api/ {
            proxy_pass http://api_backend/;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            proxy_connect_timeout 10s;
            proxy_read_timeout 30s;
            proxy_send_timeout 30s;
        }
    }
}

Volumes and Data Persistence

Named volumes persist data across container restarts and rebuilds. For production, understand the backup and performance implications.

volumes:
  postgres_data:
    driver: local
    driver_opts:
      type: none
      device: /data/postgres
      o: bind

  redis_data:
    driver: local

  nginx_cache:
    driver: local

Backup Strategy

#!/bin/bash
# backup.sh - Automated PostgreSQL backup

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/postgres"

mkdir -p "$BACKUP_DIR"

docker compose exec -T postgres pg_dump \
  -U app \
  -d myapp \
  --format=custom \
  --compress=9 \
  > "$BACKUP_DIR/myapp_$TIMESTAMP.dump"

# Keep last 30 days of backups
find "$BACKUP_DIR" -name "*.dump" -mtime +30 -delete

echo "Backup completed: myapp_$TIMESTAMP.dump"

# Restore from backup
docker compose exec -T postgres pg_restore \
  -U app \
  -d myapp \
  --clean \
  --if-exists \
  < /backups/postgres/myapp_20260314_120000.dump

Secrets Management

Never put secrets in your Compose file or commit .env files to version control. Docker Compose supports multiple approaches.

Environment Files

# .env.production (never committed, deployed separately)
DB_PASSWORD=generated-secure-password-here
REDIS_PASSWORD=another-secure-password
JWT_SECRET=your-jwt-signing-key
API_KEY=external-service-api-key

# docker-compose.prod.yml
services:
  api:
    env_file:
      - .env.production

Docker Secrets (Swarm Mode)

For Docker Swarm deployments, use native secrets that are mounted as files in the container.

services:
  api:
    secrets:
      - db_password
      - jwt_secret
    environment:
      - DB_PASSWORD_FILE=/run/secrets/db_password
      - JWT_SECRET_FILE=/run/secrets/jwt_secret

secrets:
  db_password:
    external: true
  jwt_secret:
    external: true

// Reading secrets from files in the application
import { readFileSync } from "fs";

function getSecret(name: string): string {
  const filePath = process.env[`${name}_FILE`];
  if (filePath) {
    return readFileSync(filePath, "utf-8").trim();
  }
  return process.env[name] ?? "";
}

const dbPassword = getSecret("DB_PASSWORD");

Deployment and Updates

Zero-Downtime Deployments

#!/bin/bash
# deploy.sh - Zero-downtime deployment

set -e

echo "Pulling latest images..."
docker compose -f docker-compose.yml -f docker-compose.prod.yml pull

echo "Building application images..."
docker compose -f docker-compose.yml -f docker-compose.prod.yml build

echo "Rolling update..."
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d \
  --no-deps \
  --build \
  api

echo "Waiting for health check..."
sleep 10

docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d \
  --no-deps \
  --build \
  web

echo "Cleaning up old images..."
docker image prune -f

echo "Deployment complete."
docker compose -f docker-compose.yml -f docker-compose.prod.yml ps

Logging Configuration

services:
  api:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"
        tag: "{{.Name}}"

Without log rotation, container logs will fill your disk. The json-file driver with max-size and max-file prevents this.

# View logs with timestamps
docker compose logs -f --timestamps api

# View last 100 lines
docker compose logs --tail=100 api

Monitoring

Basic Monitoring with cAdvisor

services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    deploy:
      resources:
        limits:
          cpus: "0.25"
          memory: 256M

For more comprehensive monitoring, add Prometheus and Grafana to your stack. But cAdvisor alone gives you real-time visibility into CPU, memory, network, and disk I/O per container.

Production Failure Modes Specific to Compose

Compose is simple, which is its main virtue, but a handful of failure modes repeat across teams. Plan for each before you hit them in production.

Failure mode	Symptom	Root cause	Mitigation
Thundering restart	Container restarts in a loop, CPU pegged	`restart: always` without `restart_policy` back-off	Switch to `deploy.restart_policy` with `delay: 10s` and `max_attempts: 5`
Orphan volumes filling disk	Free disk drops 10-50GB over weeks, no obvious file	`docker volume` orphans from old stack names or failed builds	Weekly `docker volume prune -f` in a cron; monitor with `du -sh /var/lib/docker/volumes/*`
Silent bind-mount data loss	Database looks empty after host reboot	Named volume redefined to a different path, or `docker compose down -v` run by mistake	Back up before `down -v`; document which volumes are bind-mounted
OOM cascade	One container OOMs and triggers host-level memory pressure	No `mem_limit` set, or limits sum exceeds host RAM	Sum all `memory` limits and ensure they are < 70% of host RAM
Zombie networks	`docker network ls` shows 30+ networks from failed deploys	Unclean `compose down` without `--remove-orphans`	Use `docker compose down --remove-orphans` in deploy scripts; weekly `docker network prune`
Log-driver disk fill	`/var/lib/docker/containers//-json.log` grows to fill disk	Default `json-file` driver with no `max-size`	Set `max-size: 10m` and `max-file: 5` on every service's logging block

Observability Baseline

At minimum, every production Compose host should expose:

docker stats snapshotted to a time-series database (Prometheus node-exporter + cAdvisor) on 30-second intervals
Per-service log aggregation to an external system (Loki, CloudWatch Logs, or Papertrail) — local log files are lost on host failure
A liveness endpoint on each app service surfaced through the reverse proxy, so external uptime monitoring (UptimeRobot, Better Uptime, Pingdom) can page you on failure
Weekly backup-restore rehearsals — not just pg_dump success, but actually restoring to a scratch container and running SELECT count(*) on key tables

When to Outgrow Compose

Docker Compose works well for:

Single-server deployments
Small teams (2–10 services)
Predictable traffic patterns
Applications that can tolerate brief downtime during deploys

Consider moving to Kubernetes or a managed container platform when you need:

Multi-node horizontal scaling
Automatic failover across hosts
Advanced traffic management (canary, blue/green)
Multi-region deployment

Getting Started

Docker Compose closes the gap between development and production with minimal complexity. The patterns in this guide — health checks, resource limits, proper networking, and secrets management — transform a development convenience tool into a production-ready cloud deployment platform.

If you need help containerizing your application or setting up production infrastructure, reach out to our team. We build and deploy containerized applications with Docker, Compose, and cloud-native infrastructure — optimized for reliability, security, and maintainability.

Containerize it. Configure it properly. Ship it.

Frequently Asked Questions

Is Docker Compose production-grade or should we use Kubernetes?

Docker Compose is fine for single-host deployments serving up to moderate traffic — thousands of requests per minute on a well-sized VPS or bare-metal host. Move to Kubernetes, ECS, or similar orchestrators once you need multi-host scaling, zero-downtime deploys, auto-recovery, or multi-region failover. Roughly half of production apps never outgrow Compose because single-host is simpler and cheaper to operate.

How do we handle zero-downtime deploys with Docker Compose?

Use a reverse proxy like Traefik or Caddy with two rolling service instances, or wire up blue-green deployment scripts that launch the new version behind the proxy, health-check, then swap. Out of the box, docker compose up takes 5-30 seconds of downtime per service during restart. Teams that need true zero-downtime usually graduate to Kubernetes eventually, but short restart windows are acceptable for most internal and B2B workloads.

What is the right way to manage secrets in production Compose?

Never bake secrets into images and avoid plaintext in .env files committed to source. Use Docker secrets (Swarm mode) or an external secret manager like HashiCorp Vault, AWS Secrets Manager, or Doppler and inject at container startup. For simple deployments, an encrypted .env file with sops or git-crypt is a pragmatic middle ground.

What is the biggest gotcha running Compose in production?

Relying on restart: always as a reliability strategy. Compose restarts crashed containers but does not handle node failures, disk pressure, or subtle memory leaks gracefully. Pair Compose with external monitoring (Uptime Kuma, Datadog, Grafana Cloud), automated backups, and a documented runbook for node-level incidents. Without these, small operational issues snowball into extended outages.

Should we use Docker Compose v1 or v2 in 2026?

Use Compose v2 — it is the default in modern Docker Desktop and Docker Engine installs, invoked as docker compose (space, not hyphen). Compose v1 (docker-compose with a hyphen) was officially deprecated by Docker in 2023 and no longer receives updates. v2 is written in Go and lives inside the Docker CLI plugin system, which means better startup performance, native profile support, and consistent behavior across Linux, macOS, and Windows. If a production host still ships only v1, install the docker-compose-plugin package before rolling out new manifests so every environment speaks the same version.

How should cron jobs and scheduled tasks run alongside Compose services?

Run scheduled tasks as sidecar containers with their own service entry, or use a dedicated scheduler like Ofelia that reads labels off your other Compose services. Do not install cron inside your main application image — it blurs responsibility, complicates logs, and makes rollbacks harder. For heavier workloads such as nightly ETL or report generation, trigger short-lived docker compose run --rm invocations from a host-level cron or systemd timer, so the scheduler and the workload stay cleanly separated and both survive a container restart.

This guide covers the configuration patterns that separate a development Compose file from a production-ready one, with complete YAML examples you can adapt.

Production vs Development Configuration

The first rule: never use your development docker-compose.yml in production. Development configs mount source code, expose debug ports, run in development mode, and skip security hardening.

Use Compose override files to layer production configuration on top of shared base config.

docker-compose.yml              # Shared base configuration
docker-compose.override.yml     # Development overrides (auto-loaded locally)
docker-compose.prod.yml         # Production overrides

# Development (auto-loads override)
docker compose up

# Production (explicit file selection)
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

A Complete Multi-Service Stack

Here is a production-ready stack with a Next.js frontend, a Node.js API, PostgreSQL, Redis, and Nginx as a reverse proxy.

Base Configuration

# docker-compose.yml
name: myapp

services:
  web:
    build:
      context: ./apps/web
      dockerfile: Dockerfile
    depends_on:
      api:
        condition: service_healthy
    environment:
      - NEXT_PUBLIC_API_URL=http://api:4000

  api:
    build:
      context: ./apps/api
      dockerfile: Dockerfile
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgresql://app:${DB_PASSWORD}@postgres:5432/myapp
      - REDIS_URL=redis://redis:6379

  postgres:
    image: postgres:16-alpine
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init-db:/docker-entrypoint-initdb.d
    environment:
      - POSTGRES_DB=myapp
      - POSTGRES_USER=app
      - POSTGRES_PASSWORD=${DB_PASSWORD}

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}

volumes:
  postgres_data:
  redis_data:

Production Overrides

# docker-compose.prod.yml
services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
      - ./nginx/ssl:/etc/nginx/ssl:ro
      - nginx_cache:/var/cache/nginx
    depends_on:
      web:
        condition: service_healthy
    restart: always
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 256M

  web:
    restart: always
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 256M
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  api:
    restart: always
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 1G
        reservations:
          cpus: "0.5"
          memory: 512M
      replicas: 2
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:4000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s

  postgres:
    restart: always
    deploy:
      resources:
        limits:
          cpus: "2.0"
          memory: 2G
        reservations:
          cpus: "1.0"
          memory: 1G
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U app -d myapp"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    command:
      - "postgres"
      - "-c"
      - "shared_buffers=512MB"
      - "-c"
      - "effective_cache_size=1536MB"
      - "-c"
      - "maintenance_work_mem=128MB"
      - "-c"
      - "random_page_cost=1.1"
      - "-c"
      - "effective_io_concurrency=200"
      - "-c"
      - "max_connections=100"
      - "-c"
      - "log_min_duration_statement=200"

  redis:
    restart: always
    deploy:
      resources:
        limits:
          cpus: "0.5"
          memory: 512M
        reservations:
          cpus: "0.25"
          memory: 256M
    healthcheck:
      test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

volumes:
  nginx_cache:

Health Checks

Health Check Patterns

services:
  # HTTP health check
  web:
    healthcheck:
      test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/api/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  # TCP health check
  api:
    healthcheck:
      test: ["CMD-SHELL", "nc -z localhost 4000 || exit 1"]
      interval: 15s
      timeout: 5s
      retries: 3

  # Database health check
  postgres:
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s

  # Redis health check
  redis:
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 5

Health Check Parameters

Parameter	Purpose	Recommended
`interval`	Time between checks	10–30s
`timeout`	Max time per check	5–10s
`retries`	Failures before unhealthy	3–5
`start_period`	Grace period on startup	App-dependent (20–60s)

Application Health Endpoints

Your health check endpoint should verify that the application and its dependencies are functional.

// apps/api/src/routes/health.ts
import { Router } from "express";
import { db } from "../lib/db";
import { redis } from "../lib/redis";

const router = Router();

router.get("/health", async (req, res) => {
  const checks: Record<string, string> = {};

  try {
    await db.$queryRaw`SELECT 1`;
    checks.database = "ok";
  } catch {
    checks.database = "error";
  }

  try {
    await redis.ping();
    checks.redis = "ok";
  } catch {
    checks.redis = "error";
  }

  const isHealthy = Object.values(checks).every((v) => v === "ok");

  res.status(isHealthy ? 200 : 503).json({
    status: isHealthy ? "healthy" : "degraded",
    checks,
    uptime: process.uptime(),
    timestamp: new Date().toISOString(),
  });
});

export { router as healthRouter };

Resource Limits

Without resource limits, a single misbehaving container can consume all available memory and crash the host, taking every other service with it.

services:
  api:
    deploy:
      resources:
        limits:
          cpus: "1.0"     # Max 1 CPU core
          memory: 1G      # Max 1GB RAM (OOM-killed if exceeded)
        reservations:
          cpus: "0.5"     # Guaranteed 0.5 CPU cores
          memory: 512M    # Guaranteed 512MB RAM

Resource Planning

Service Type	CPU Limit	Memory Limit	Notes
Next.js frontend	0.5–1.0	512M–1G	Memory depends on page count
Node.js API	0.5–2.0	512M–2G	Depends on concurrency
PostgreSQL	1.0–4.0	1G–4G	Memory = cache performance
Redis	0.25–0.5	256M–1G	Size of cached dataset
Nginx	0.25–0.5	128M–256M	Minimal resource needs

Monitor actual usage with docker stats before setting final limits. Over-constraining wastes resources; under-constraining risks OOM kills.

# Monitor real-time resource usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"

Networking

Docker Compose creates a default network for each project. Services communicate by name. For production, define explicit networks to control traffic flow.

services:
  nginx:
    networks:
      - frontend

  web:
    networks:
      - frontend
      - backend

  api:
    networks:
      - backend
      - data

  postgres:
    networks:
      - data

  redis:
    networks:
      - data

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge
  data:
    driver: bridge
    internal: true  # No external access

The internal: true flag on the data network means PostgreSQL and Redis are only accessible from services on that network — not from the host or the internet.

Nginx Reverse Proxy Configuration

# nginx/nginx.conf
worker_processes auto;

events {
    worker_connections 1024;
}

http {
    upstream web_backend {
        server web:3000;
    }

    upstream api_backend {
        server api:4000;
    }

    server {
        listen 80;
        server_name example.com;
        return 301 https://$host$request_uri;
    }

    server {
        listen 443 ssl http2;
        server_name example.com;

        ssl_certificate /etc/nginx/ssl/cert.pem;
        ssl_certificate_key /etc/nginx/ssl/key.pem;
        ssl_protocols TLSv1.2 TLSv1.3;
        ssl_ciphers HIGH:!aNULL:!MD5;

        gzip on;
        gzip_types text/plain text/css application/json application/javascript;

        location / {
            proxy_pass http://web_backend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
        }

        location /api/ {
            proxy_pass http://api_backend/;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;

            proxy_connect_timeout 10s;
            proxy_read_timeout 30s;
            proxy_send_timeout 30s;
        }
    }
}

Volumes and Data Persistence

Named volumes persist data across container restarts and rebuilds. For production, understand the backup and performance implications.

volumes:
  postgres_data:
    driver: local
    driver_opts:
      type: none
      device: /data/postgres
      o: bind

  redis_data:
    driver: local

  nginx_cache:
    driver: local

Backup Strategy

#!/bin/bash
# backup.sh - Automated PostgreSQL backup

TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/postgres"

mkdir -p "$BACKUP_DIR"

docker compose exec -T postgres pg_dump \
  -U app \
  -d myapp \
  --format=custom \
  --compress=9 \
  > "$BACKUP_DIR/myapp_$TIMESTAMP.dump"

# Keep last 30 days of backups
find "$BACKUP_DIR" -name "*.dump" -mtime +30 -delete

echo "Backup completed: myapp_$TIMESTAMP.dump"

# Restore from backup
docker compose exec -T postgres pg_restore \
  -U app \
  -d myapp \
  --clean \
  --if-exists \
  < /backups/postgres/myapp_20260314_120000.dump

Secrets Management

Never put secrets in your Compose file or commit .env files to version control. Docker Compose supports multiple approaches.

Environment Files

# .env.production (never committed, deployed separately)
DB_PASSWORD=generated-secure-password-here
REDIS_PASSWORD=another-secure-password
JWT_SECRET=your-jwt-signing-key
API_KEY=external-service-api-key

# docker-compose.prod.yml
services:
  api:
    env_file:
      - .env.production

Docker Secrets (Swarm Mode)

For Docker Swarm deployments, use native secrets that are mounted as files in the container.

services:
  api:
    secrets:
      - db_password
      - jwt_secret
    environment:
      - DB_PASSWORD_FILE=/run/secrets/db_password
      - JWT_SECRET_FILE=/run/secrets/jwt_secret

secrets:
  db_password:
    external: true
  jwt_secret:
    external: true

// Reading secrets from files in the application
import { readFileSync } from "fs";

function getSecret(name: string): string {
  const filePath = process.env[`${name}_FILE`];
  if (filePath) {
    return readFileSync(filePath, "utf-8").trim();
  }
  return process.env[name] ?? "";
}

const dbPassword = getSecret("DB_PASSWORD");

Deployment and Updates

Zero-Downtime Deployments

#!/bin/bash
# deploy.sh - Zero-downtime deployment

set -e

echo "Pulling latest images..."
docker compose -f docker-compose.yml -f docker-compose.prod.yml pull

echo "Building application images..."
docker compose -f docker-compose.yml -f docker-compose.prod.yml build

echo "Rolling update..."
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d \
  --no-deps \
  --build \
  api

echo "Waiting for health check..."
sleep 10

docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d \
  --no-deps \
  --build \
  web

echo "Cleaning up old images..."
docker image prune -f

echo "Deployment complete."
docker compose -f docker-compose.yml -f docker-compose.prod.yml ps

Logging Configuration

services:
  api:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"
        tag: "{{.Name}}"

Without log rotation, container logs will fill your disk. The json-file driver with max-size and max-file prevents this.

# View logs with timestamps
docker compose logs -f --timestamps api

# View last 100 lines
docker compose logs --tail=100 api

Monitoring

Basic Monitoring with cAdvisor

services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    ports:
      - "8080:8080"
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    deploy:
      resources:
        limits:
          cpus: "0.25"
          memory: 256M

For more comprehensive monitoring, add Prometheus and Grafana to your stack. But cAdvisor alone gives you real-time visibility into CPU, memory, network, and disk I/O per container.

Production Failure Modes Specific to Compose

Compose is simple, which is its main virtue, but a handful of failure modes repeat across teams. Plan for each before you hit them in production.

Failure mode	Symptom	Root cause	Mitigation
Thundering restart	Container restarts in a loop, CPU pegged	`restart: always` without `restart_policy` back-off	Switch to `deploy.restart_policy` with `delay: 10s` and `max_attempts: 5`
Orphan volumes filling disk	Free disk drops 10-50GB over weeks, no obvious file	`docker volume` orphans from old stack names or failed builds	Weekly `docker volume prune -f` in a cron; monitor with `du -sh /var/lib/docker/volumes/*`
Silent bind-mount data loss	Database looks empty after host reboot	Named volume redefined to a different path, or `docker compose down -v` run by mistake	Back up before `down -v`; document which volumes are bind-mounted
OOM cascade	One container OOMs and triggers host-level memory pressure	No `mem_limit` set, or limits sum exceeds host RAM	Sum all `memory` limits and ensure they are < 70% of host RAM
Zombie networks	`docker network ls` shows 30+ networks from failed deploys	Unclean `compose down` without `--remove-orphans`	Use `docker compose down --remove-orphans` in deploy scripts; weekly `docker network prune`
Log-driver disk fill	`/var/lib/docker/containers//-json.log` grows to fill disk	Default `json-file` driver with no `max-size`	Set `max-size: 10m` and `max-file: 5` on every service's logging block

Observability Baseline

At minimum, every production Compose host should expose:

docker stats snapshotted to a time-series database (Prometheus node-exporter + cAdvisor) on 30-second intervals
Per-service log aggregation to an external system (Loki, CloudWatch Logs, or Papertrail) — local log files are lost on host failure
A liveness endpoint on each app service surfaced through the reverse proxy, so external uptime monitoring (UptimeRobot, Better Uptime, Pingdom) can page you on failure
Weekly backup-restore rehearsals — not just pg_dump success, but actually restoring to a scratch container and running SELECT count(*) on key tables

Production vs Development Configuration

A Complete Multi-Service Stack

Base Configuration

Production Overrides

Health Checks

Health Check Patterns

Health Check Parameters

Application Health Endpoints

Resource Limits

Resource Planning

Networking

Nginx Reverse Proxy Configuration

Volumes and Data Persistence

Backup Strategy

Secrets Management

Environment Files

Docker Secrets (Swarm Mode)

Deployment and Updates

Zero-Downtime Deployments

Logging Configuration

Monitoring

Basic Monitoring with cAdvisor

Production Failure Modes Specific to Compose

Observability Baseline

When to Outgrow Compose

Getting Started

Frequently Asked Questions

Is Docker Compose production-grade or should we use Kubernetes?

How do we handle zero-downtime deploys with Docker Compose?

What is the right way to manage secrets in production Compose?

What is the biggest gotcha running Compose in production?

Should we use Docker Compose v1 or v2 in 2026?

How should cron jobs and scheduled tasks run alongside Compose services?

Need Help Building Your Project?

Related Articles

CI/CD Pipeline Best Practices for Modern Development Teams

AI Browser Automation in 2026: ChatGPT Agent, Computer Use, and What Actually Ships

AI Cost Optimization at Scale: How We Cut LLM Bills 60% Without Quality Loss

Production vs Development Configuration

A Complete Multi-Service Stack

Base Configuration

Production Overrides

Health Checks

Health Check Patterns

Health Check Parameters

Application Health Endpoints

Resource Limits

Resource Planning

Networking

Nginx Reverse Proxy Configuration

Volumes and Data Persistence

Backup Strategy

Secrets Management

Environment Files

Docker Secrets (Swarm Mode)

Deployment and Updates

Zero-Downtime Deployments

Logging Configuration

Monitoring

Basic Monitoring with cAdvisor

Production Failure Modes Specific to Compose

Observability Baseline

When to Outgrow Compose

Getting Started

Frequently Asked Questions

Is Docker Compose production-grade or should we use Kubernetes?

How do we handle zero-downtime deploys with Docker Compose?

What is the right way to manage secrets in production Compose?

What is the biggest gotcha running Compose in production?

Should we use Docker Compose v1 or v2 in 2026?

How should cron jobs and scheduled tasks run alongside Compose services?

Need Help Building Your Project?

Related Articles

CI/CD Pipeline Best Practices for Modern Development Teams

AI Browser Automation in 2026: ChatGPT Agent, Computer Use, and What Actually Ships

AI Cost Optimization at Scale: How We Cut LLM Bills 60% Without Quality Loss