TL;DR: A practical guide to running Docker Compose in production. Covers multi-service setup, health checks, resource limits, networking, volumes, secrets management, and production-hardened YAML configurations.
Docker Compose is not just for local development. With the right configuration — health checks, resource limits, restart policies, and proper networking — Compose runs production workloads reliably on a single server or small cluster. It is the simplest path from "works on my machine" to "works in production" for teams that do not need Kubernetes-level orchestration.
This guide covers the configuration patterns that separate a development Compose file from a production-ready one, with complete YAML examples you can adapt.
Production vs Development Configuration
The first rule: never use your development docker-compose.yml in production. Development configs mount source code, expose debug ports, run in development mode, and skip security hardening.
Use Compose override files to layer production configuration on top of shared base config.
docker-compose.yml # Shared base configuration
docker-compose.override.yml # Development overrides (auto-loaded locally)
docker-compose.prod.yml # Production overrides
# Development (auto-loads override)
docker compose up
# Production (explicit file selection)
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
A Complete Multi-Service Stack
Here is a production-ready stack with a Next.js frontend, a Node.js API, PostgreSQL, Redis, and Nginx as a reverse proxy.
Base Configuration
# docker-compose.yml
name: myapp
services:
web:
build:
context: ./apps/web
dockerfile: Dockerfile
depends_on:
api:
condition: service_healthy
environment:
- NEXT_PUBLIC_API_URL=http://api:4000
api:
build:
context: ./apps/api
dockerfile: Dockerfile
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
environment:
- NODE_ENV=production
- DATABASE_URL=postgresql://app:${DB_PASSWORD}@postgres:5432/myapp
- REDIS_URL=redis://redis:6379
postgres:
image: postgres:16-alpine
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init-db:/docker-entrypoint-initdb.d
environment:
- POSTGRES_DB=myapp
- POSTGRES_USER=app
- POSTGRES_PASSWORD=${DB_PASSWORD}
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
volumes:
postgres_data:
redis_data:
Production Overrides
# docker-compose.prod.yml
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
- "443:443"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf:ro
- ./nginx/ssl:/etc/nginx/ssl:ro
- nginx_cache:/var/cache/nginx
depends_on:
web:
condition: service_healthy
restart: always
deploy:
resources:
limits:
cpus: "0.5"
memory: 256M
web:
restart: always
deploy:
resources:
limits:
cpus: "1.0"
memory: 512M
reservations:
cpus: "0.25"
memory: 256M
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
api:
restart: always
deploy:
resources:
limits:
cpus: "1.0"
memory: 1G
reservations:
cpus: "0.5"
memory: 512M
replicas: 2
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:4000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s
postgres:
restart: always
deploy:
resources:
limits:
cpus: "2.0"
memory: 2G
reservations:
cpus: "1.0"
memory: 1G
healthcheck:
test: ["CMD-SHELL", "pg_isready -U app -d myapp"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
command:
- "postgres"
- "-c"
- "shared_buffers=512MB"
- "-c"
- "effective_cache_size=1536MB"
- "-c"
- "maintenance_work_mem=128MB"
- "-c"
- "random_page_cost=1.1"
- "-c"
- "effective_io_concurrency=200"
- "-c"
- "max_connections=100"
- "-c"
- "log_min_duration_statement=200"
redis:
restart: always
deploy:
resources:
limits:
cpus: "0.5"
memory: 512M
reservations:
cpus: "0.25"
memory: 256M
healthcheck:
test: ["CMD", "redis-cli", "-a", "${REDIS_PASSWORD}", "ping"]
interval: 10s
timeout: 5s
retries: 5
volumes:
nginx_cache:
Health Checks
Health checks are the most important production configuration. Without them, Compose cannot determine whether a service is actually ready, depends_on conditions do not wait for readiness, and failed services are not restarted.
Health Check Patterns
services:
# HTTP health check
web:
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:3000/api/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
# TCP health check
api:
healthcheck:
test: ["CMD-SHELL", "nc -z localhost 4000 || exit 1"]
interval: 15s
timeout: 5s
retries: 3
# Database health check
postgres:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
interval: 10s
timeout: 5s
retries: 5
start_period: 30s
# Redis health check
redis:
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 10s
timeout: 5s
retries: 5
Health Check Parameters
| Parameter | Purpose | Recommended |
|-----------|---------|-------------|
| interval | Time between checks | 10–30s |
| timeout | Max time per check | 5–10s |
| retries | Failures before unhealthy | 3–5 |
| start_period | Grace period on startup | App-dependent (20–60s) |
Application Health Endpoints
Your health check endpoint should verify that the application and its dependencies are functional.
// apps/api/src/routes/health.ts
import { Router } from "express";
import { db } from "../lib/db";
import { redis } from "../lib/redis";
const router = Router();
router.get("/health", async (req, res) => {
const checks: Record<string, string> = {};
try {
await db.$queryRaw`SELECT 1`;
checks.database = "ok";
} catch {
checks.database = "error";
}
try {
await redis.ping();
checks.redis = "ok";
} catch {
checks.redis = "error";
}
const isHealthy = Object.values(checks).every((v) => v === "ok");
res.status(isHealthy ? 200 : 503).json({
status: isHealthy ? "healthy" : "degraded",
checks,
uptime: process.uptime(),
timestamp: new Date().toISOString(),
});
});
export { router as healthRouter };
Resource Limits
Without resource limits, a single misbehaving container can consume all available memory and crash the host, taking every other service with it.
services:
api:
deploy:
resources:
limits:
cpus: "1.0" # Max 1 CPU core
memory: 1G # Max 1GB RAM (OOM-killed if exceeded)
reservations:
cpus: "0.5" # Guaranteed 0.5 CPU cores
memory: 512M # Guaranteed 512MB RAM
Resource Planning
| Service Type | CPU Limit | Memory Limit | Notes | |-------------|-----------|-------------|-------| | Next.js frontend | 0.5–1.0 | 512M–1G | Memory depends on page count | | Node.js API | 0.5–2.0 | 512M–2G | Depends on concurrency | | PostgreSQL | 1.0–4.0 | 1G–4G | Memory = cache performance | | Redis | 0.25–0.5 | 256M–1G | Size of cached dataset | | Nginx | 0.25–0.5 | 128M–256M | Minimal resource needs |
Monitor actual usage with docker stats before setting final limits. Over-constraining wastes resources; under-constraining risks OOM kills.
# Monitor real-time resource usage
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.NetIO}}"
Networking
Docker Compose creates a default network for each project. Services communicate by name. For production, define explicit networks to control traffic flow.
services:
nginx:
networks:
- frontend
web:
networks:
- frontend
- backend
api:
networks:
- backend
- data
postgres:
networks:
- data
redis:
networks:
- data
networks:
frontend:
driver: bridge
backend:
driver: bridge
data:
driver: bridge
internal: true # No external access
The internal: true flag on the data network means PostgreSQL and Redis are only accessible from services on that network — not from the host or the internet.
Nginx Reverse Proxy Configuration
# nginx/nginx.conf
worker_processes auto;
events {
worker_connections 1024;
}
http {
upstream web_backend {
server web:3000;
}
upstream api_backend {
server api:4000;
}
server {
listen 80;
server_name example.com;
return 301 https://$host$request_uri;
}
server {
listen 443 ssl http2;
server_name example.com;
ssl_certificate /etc/nginx/ssl/cert.pem;
ssl_certificate_key /etc/nginx/ssl/key.pem;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
gzip on;
gzip_types text/plain text/css application/json application/javascript;
location / {
proxy_pass http://web_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /api/ {
proxy_pass http://api_backend/;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_connect_timeout 10s;
proxy_read_timeout 30s;
proxy_send_timeout 30s;
}
}
}
Volumes and Data Persistence
Named volumes persist data across container restarts and rebuilds. For production, understand the backup and performance implications.
volumes:
postgres_data:
driver: local
driver_opts:
type: none
device: /data/postgres
o: bind
redis_data:
driver: local
nginx_cache:
driver: local
Backup Strategy
#!/bin/bash
# backup.sh - Automated PostgreSQL backup
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_DIR="/backups/postgres"
mkdir -p "$BACKUP_DIR"
docker compose exec -T postgres pg_dump \
-U app \
-d myapp \
--format=custom \
--compress=9 \
> "$BACKUP_DIR/myapp_$TIMESTAMP.dump"
# Keep last 30 days of backups
find "$BACKUP_DIR" -name "*.dump" -mtime +30 -delete
echo "Backup completed: myapp_$TIMESTAMP.dump"
# Restore from backup
docker compose exec -T postgres pg_restore \
-U app \
-d myapp \
--clean \
--if-exists \
< /backups/postgres/myapp_20260314_120000.dump
Secrets Management
Never put secrets in your Compose file or commit .env files to version control. Docker Compose supports multiple approaches.
Environment Files
# .env.production (never committed, deployed separately)
DB_PASSWORD=generated-secure-password-here
REDIS_PASSWORD=another-secure-password
JWT_SECRET=your-jwt-signing-key
API_KEY=external-service-api-key
# docker-compose.prod.yml
services:
api:
env_file:
- .env.production
Docker Secrets (Swarm Mode)
For Docker Swarm deployments, use native secrets that are mounted as files in the container.
services:
api:
secrets:
- db_password
- jwt_secret
environment:
- DB_PASSWORD_FILE=/run/secrets/db_password
- JWT_SECRET_FILE=/run/secrets/jwt_secret
secrets:
db_password:
external: true
jwt_secret:
external: true
// Reading secrets from files in the application
import { readFileSync } from "fs";
function getSecret(name: string): string {
const filePath = process.env[`${name}_FILE`];
if (filePath) {
return readFileSync(filePath, "utf-8").trim();
}
return process.env[name] ?? "";
}
const dbPassword = getSecret("DB_PASSWORD");
Deployment and Updates
Zero-Downtime Deployments
#!/bin/bash
# deploy.sh - Zero-downtime deployment
set -e
echo "Pulling latest images..."
docker compose -f docker-compose.yml -f docker-compose.prod.yml pull
echo "Building application images..."
docker compose -f docker-compose.yml -f docker-compose.prod.yml build
echo "Rolling update..."
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d \
--no-deps \
--build \
api
echo "Waiting for health check..."
sleep 10
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d \
--no-deps \
--build \
web
echo "Cleaning up old images..."
docker image prune -f
echo "Deployment complete."
docker compose -f docker-compose.yml -f docker-compose.prod.yml ps
Logging Configuration
services:
api:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "5"
tag: "{{.Name}}"
Without log rotation, container logs will fill your disk. The json-file driver with max-size and max-file prevents this.
# View logs with timestamps
docker compose logs -f --timestamps api
# View last 100 lines
docker compose logs --tail=100 api
Monitoring
Basic Monitoring with cAdvisor
services:
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
deploy:
resources:
limits:
cpus: "0.25"
memory: 256M
For more comprehensive monitoring, add Prometheus and Grafana to your stack. But cAdvisor alone gives you real-time visibility into CPU, memory, network, and disk I/O per container.
Production Failure Modes Specific to Compose
Compose is simple, which is its main virtue, but a handful of failure modes repeat across teams. Plan for each before you hit them in production.
| Failure mode | Symptom | Root cause | Mitigation |
|--------------|---------|------------|------------|
| Thundering restart | Container restarts in a loop, CPU pegged | restart: always without restart_policy back-off | Switch to deploy.restart_policy with delay: 10s and max_attempts: 5 |
| Orphan volumes filling disk | Free disk drops 10-50GB over weeks, no obvious file | docker volume orphans from old stack names or failed builds | Weekly docker volume prune -f in a cron; monitor with du -sh /var/lib/docker/volumes/* |
| Silent bind-mount data loss | Database looks empty after host reboot | Named volume redefined to a different path, or docker compose down -v run by mistake | Back up before down -v; document which volumes are bind-mounted |
| OOM cascade | One container OOMs and triggers host-level memory pressure | No mem_limit set, or limits sum exceeds host RAM | Sum all memory limits and ensure they are < 70% of host RAM |
| Zombie networks | docker network ls shows 30+ networks from failed deploys | Unclean compose down without --remove-orphans | Use docker compose down --remove-orphans in deploy scripts; weekly docker network prune |
| Log-driver disk fill | /var/lib/docker/containers/*/*-json.log grows to fill disk | Default json-file driver with no max-size | Set max-size: 10m and max-file: 5 on every service's logging block |
Observability Baseline
At minimum, every production Compose host should expose:
docker statssnapshotted to a time-series database (Prometheus node-exporter + cAdvisor) on 30-second intervals- Per-service log aggregation to an external system (Loki, CloudWatch Logs, or Papertrail) — local log files are lost on host failure
- A liveness endpoint on each app service surfaced through the reverse proxy, so external uptime monitoring (UptimeRobot, Better Uptime, Pingdom) can page you on failure
- Weekly backup-restore rehearsals — not just
pg_dumpsuccess, but actually restoring to a scratch container and runningSELECT count(*)on key tables
When to Outgrow Compose
Docker Compose works well for:
- Single-server deployments
- Small teams (2–10 services)
- Predictable traffic patterns
- Applications that can tolerate brief downtime during deploys
Consider moving to Kubernetes or a managed container platform when you need:
- Multi-node horizontal scaling
- Automatic failover across hosts
- Advanced traffic management (canary, blue/green)
- Multi-region deployment
Getting Started
Docker Compose closes the gap between development and production with minimal complexity. The patterns in this guide — health checks, resource limits, proper networking, and secrets management — transform a development convenience tool into a production-ready cloud deployment platform.
If you need help containerizing your application or setting up production infrastructure, reach out to our team. We build and deploy containerized applications with Docker, Compose, and cloud-native infrastructure — optimized for reliability, security, and maintainability.
Containerize it. Configure it properly. Ship it.
Frequently Asked Questions
Is Docker Compose production-grade or should we use Kubernetes?
Docker Compose is fine for single-host deployments serving up to moderate traffic — thousands of requests per minute on a well-sized VPS or bare-metal host. Move to Kubernetes, ECS, or similar orchestrators once you need multi-host scaling, zero-downtime deploys, auto-recovery, or multi-region failover. Roughly half of production apps never outgrow Compose because single-host is simpler and cheaper to operate.
How do we handle zero-downtime deploys with Docker Compose?
Use a reverse proxy like Traefik or Caddy with two rolling service instances, or wire up blue-green deployment scripts that launch the new version behind the proxy, health-check, then swap. Out of the box, docker compose up takes 5-30 seconds of downtime per service during restart. Teams that need true zero-downtime usually graduate to Kubernetes eventually, but short restart windows are acceptable for most internal and B2B workloads.
What is the right way to manage secrets in production Compose?
Never bake secrets into images and avoid plaintext in .env files committed to source. Use Docker secrets (Swarm mode) or an external secret manager like HashiCorp Vault, AWS Secrets Manager, or Doppler and inject at container startup. For simple deployments, an encrypted .env file with sops or git-crypt is a pragmatic middle ground.
What is the biggest gotcha running Compose in production?
Relying on restart: always as a reliability strategy. Compose restarts crashed containers but does not handle node failures, disk pressure, or subtle memory leaks gracefully. Pair Compose with external monitoring (Uptime Kuma, Datadog, Grafana Cloud), automated backups, and a documented runbook for node-level incidents. Without these, small operational issues snowball into extended outages.
Should we use Docker Compose v1 or v2 in 2026?
Use Compose v2 — it is the default in modern Docker Desktop and Docker Engine installs, invoked as docker compose (space, not hyphen). Compose v1 (docker-compose with a hyphen) was officially deprecated by Docker in 2023 and no longer receives updates. v2 is written in Go and lives inside the Docker CLI plugin system, which means better startup performance, native profile support, and consistent behavior across Linux, macOS, and Windows. If a production host still ships only v1, install the docker-compose-plugin package before rolling out new manifests so every environment speaks the same version.
How should cron jobs and scheduled tasks run alongside Compose services?
Run scheduled tasks as sidecar containers with their own service entry, or use a dedicated scheduler like Ofelia that reads labels off your other Compose services. Do not install cron inside your main application image — it blurs responsibility, complicates logs, and makes rollbacks harder. For heavier workloads such as nightly ETL or report generation, trigger short-lived docker compose run --rm invocations from a host-level cron or systemd timer, so the scheduler and the workload stay cleanly separated and both survive a container restart.
Need Help Building Your Project?
From web apps and mobile apps to AI solutions and SaaS platforms — we ship production software for 300+ clients.
Related Articles
CI/CD Pipeline Best Practices for Modern Development Teams
A comprehensive guide to building production-grade CI/CD pipelines. Covers GitHub Actions workflows, testing strategies, deployment automation, environment management, security scanning, and artifact management.
13 min readWhat to Expect When Hiring a Software Development Agency
First time hiring a development agency? This guide covers what to prepare, how the process works, pricing models, red flags to watch for, and how to set the engagement up for success.
14 min readTechnical Due Diligence Checklist: 54 Items to Evaluate Before You Invest or Build
A comprehensive technical due diligence checklist covering architecture, code quality, security, team processes, and more. Use it for acquisitions, investments, or pre-build assessments.