Database Scaling Strategies: From Single Server to Global Scale
Author
ZTABS Team
Date Published
Every successful application eventually hits a database bottleneck. The first sign is usually a slow dashboard query. Then API response times creep up. Then you start seeing connection timeouts during peak traffic. What got you to 10,000 users will not get you to 1,000,000.
Database scaling is not a single decision — it is a sequence of strategies, each appropriate at a different stage of growth. This guide walks through them in order, from the simplest wins to global-scale architectures.
Stage 1: Vertical Scaling and Query Optimization
Before adding infrastructure complexity, extract every bit of performance from your existing setup.
Upgrade the Hardware
Vertical scaling means giving your database server more CPU, RAM, and faster storage. On managed cloud services like AWS RDS or Google Cloud SQL, this is a configuration change.
| Resource | Impact | When to Upgrade | |----------|--------|-----------------| | RAM | More data cached in memory, fewer disk reads | Buffer pool hit rate below 95% | | CPU | Faster query execution, more concurrent connections | CPU consistently above 70% | | Storage IOPS | Faster reads and writes for disk-bound workloads | I/O wait times dominating slow queries | | Network | Faster replication, faster client responses | Replication lag increasing |
Vertical scaling is limited by the largest available machine, but it buys significant runway with zero application changes.
Optimize Your Queries
The highest-ROI work is almost always query optimization. A single missing index can make a query 1000x slower.
-- Find slow queries (PostgreSQL)
SELECT
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS avg_ms,
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percent_total,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
Common fixes that yield immediate improvement:
-- Add composite index for a common query pattern
CREATE INDEX CONCURRENTLY idx_orders_user_created
ON orders (user_id, created_at DESC);
-- Partial index for frequently filtered subsets
CREATE INDEX CONCURRENTLY idx_orders_pending
ON orders (created_at DESC)
WHERE status = 'pending';
-- Cover the query entirely to avoid table lookups
CREATE INDEX CONCURRENTLY idx_orders_covering
ON orders (user_id, created_at DESC)
INCLUDE (total_amount, status);
Connection Pooling
Database connections are expensive. Each PostgreSQL connection consumes roughly 10MB of RAM, and creating a new connection takes 20-50ms. Without pooling, a burst of traffic can exhaust connection limits.
// PgBouncer configuration (pgbouncer.ini)
// Sits between your application and PostgreSQL
// [databases]
// mydb = host=localhost port=5432 dbname=mydb
//
// [pgbouncer]
// pool_mode = transaction
// max_client_conn = 1000
// default_pool_size = 25
// reserve_pool_size = 5
Pool modes matter:
| Mode | Behavior | Best For | |------|----------|----------| | Session | One server connection per client session | Apps using prepared statements, LISTEN/NOTIFY | | Transaction | Connection returned after each transaction | Most web applications (recommended default) | | Statement | Connection returned after each statement | Simple read-heavy workloads |
Transaction mode gives the best connection reuse for typical web applications. It lets 1,000 concurrent application connections share 25 actual database connections.
Stage 2: Read Replicas
Most applications are read-heavy. Dashboards, search, reporting, and content delivery all read far more than they write. Read replicas distribute this load across multiple database instances.
How Read Replicas Work
┌───────────────┐
Writes ───────▶│ Primary DB │
│ (read/write) │
└───────┬───────┘
│ Replication
┌───────┴───────┐
│ │
┌─────▼─────┐ ┌─────▼─────┐
Reads ────▶│ Replica 1 │ │ Replica 2 │
│ (read-only)│ │ (read-only)│
└───────────┘ └───────────┘
The primary handles all writes. Replicas receive a continuous stream of changes via replication and serve read queries. This effectively multiplies your read capacity by the number of replicas.
Implementing Read/Write Splitting
import { Pool } from "pg";
const primaryPool = new Pool({
host: "db-primary.internal",
port: 5432,
database: "myapp",
max: 20,
});
const replicaPool = new Pool({
host: "db-replica.internal",
port: 5432,
database: "myapp",
max: 50,
});
async function query(sql: string, params: unknown[], options?: { readonly?: boolean }) {
const pool = options?.readonly ? replicaPool : primaryPool;
return pool.query(sql, params);
}
// Usage
const user = await query("SELECT * FROM users WHERE id = $1", [userId], { readonly: true });
const result = await query("INSERT INTO orders (user_id, total) VALUES ($1, $2) RETURNING *", [userId, total]);
Replication Lag
Replicas are eventually consistent. There is always some delay between a write hitting the primary and appearing on replicas. This lag is typically under 100ms but can spike during heavy write loads.
-- Check replication lag (PostgreSQL)
SELECT
client_addr,
state,
sent_lsn,
write_lsn,
replay_lsn,
round(extract(epoch from replay_lag)::numeric, 3) AS replay_lag_seconds
FROM pg_stat_replication;
Design for eventual consistency: after a user creates an order, read the order back from the primary, not a replica.
async function createOrder(userId: string, items: OrderItem[]) {
const order = await query(
"INSERT INTO orders (user_id, total) VALUES ($1, $2) RETURNING *",
[userId, calculateTotal(items)]
);
// Read from primary immediately after write to avoid stale data
return order.rows[0];
}
async function listOrders(userId: string) {
// Read from replica — eventual consistency is acceptable for listings
return query(
"SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC",
[userId],
{ readonly: true }
);
}
Stage 3: Caching Layers
A cache sits between your application and database, serving frequently accessed data from memory instead of disk.
Caching Strategies
| Strategy | Description | Best For | |----------|-------------|----------| | Cache-Aside | App checks cache first, falls back to DB, populates cache | General-purpose, most common | | Write-Through | App writes to cache and DB simultaneously | Data that is read immediately after write | | Write-Behind | App writes to cache, cache async writes to DB | High-write throughput, eventual consistency acceptable | | Read-Through | Cache automatically loads from DB on miss | Simplified application code |
Cache-Aside with Redis
import Redis from "ioredis";
const redis = new Redis({ host: "redis.internal", port: 6379 });
async function getUserProfile(userId: string): Promise<UserProfile> {
const cacheKey = `user:${userId}:profile`;
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
const result = await query(
"SELECT id, name, email, avatar, role FROM users WHERE id = $1",
[userId],
{ readonly: true }
);
const profile = result.rows[0];
await redis.set(cacheKey, JSON.stringify(profile), "EX", 300);
return profile;
}
async function updateUserProfile(userId: string, updates: Partial<UserProfile>) {
await query(
"UPDATE users SET name = COALESCE($2, name), avatar = COALESCE($3, avatar) WHERE id = $1",
[userId, updates.name, updates.avatar]
);
// Invalidate cache on write
await redis.del(`user:${userId}:profile`);
}
Cache Invalidation Patterns
Cache invalidation is genuinely hard. Common approaches:
- TTL-based expiry — set a time-to-live on every cache entry. Simple but allows stale data within the TTL window.
- Event-driven invalidation — publish cache invalidation events when data changes. More complex but ensures fresher data.
- Version-based keys — include a version number in the cache key. Bump the version to invalidate all entries.
// Event-driven invalidation with database triggers
// PostgreSQL LISTEN/NOTIFY for cache invalidation
async function setupCacheInvalidation() {
const subscriber = new Redis({ host: "redis.internal" });
const pgListener = new Pool({ host: "db-primary.internal" });
const client = await pgListener.connect();
await client.query("LISTEN user_changes");
client.on("notification", async (msg) => {
if (msg.channel === "user_changes" && msg.payload) {
const { user_id } = JSON.parse(msg.payload);
await redis.del(`user:${user_id}:profile`);
}
});
}
Stage 4: Table Partitioning
When individual tables grow into the hundreds of millions or billions of rows, even well-indexed queries slow down. Partitioning splits a table into smaller physical segments while maintaining a single logical table.
Range Partitioning
The most common pattern partitions by date, keeping recent data in hot partitions and historical data in cold partitions.
CREATE TABLE events (
id bigint GENERATED ALWAYS AS IDENTITY,
user_id uuid NOT NULL,
event_type text NOT NULL,
payload jsonb,
created_at timestamptz NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);
CREATE TABLE events_2026_q1 PARTITION OF events
FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');
CREATE TABLE events_2026_q2 PARTITION OF events
FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');
CREATE TABLE events_2026_q3 PARTITION OF events
FOR VALUES FROM ('2026-07-01') TO ('2026-10-01');
CREATE TABLE events_2026_q4 PARTITION OF events
FOR VALUES FROM ('2026-10-01') TO ('2027-01-01');
-- Queries that include created_at in the WHERE clause
-- automatically scan only the relevant partition
SELECT * FROM events
WHERE created_at >= '2026-03-01' AND created_at < '2026-04-01'
AND user_id = 'abc-123';
Hash Partitioning
When there is no natural range key, hash partitioning distributes rows evenly across partitions.
CREATE TABLE sessions (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
user_id uuid NOT NULL,
data jsonb,
expires_at timestamptz NOT NULL
) PARTITION BY HASH (id);
CREATE TABLE sessions_p0 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_p1 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_p2 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_p3 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 3);
Partition Maintenance
Automate partition creation and cleanup. Old partitions can be detached and archived instead of running expensive DELETE operations.
-- Drop old data instantly by detaching the partition
ALTER TABLE events DETACH PARTITION events_2024_q1;
DROP TABLE events_2024_q1;
This is orders of magnitude faster than DELETE FROM events WHERE created_at < '2024-04-01', which would generate enormous amounts of WAL and lock the table.
Stage 5: Sharding
Sharding distributes data across multiple independent database instances. Each shard holds a subset of the data, and the application routes queries to the correct shard.
When Sharding Becomes Necessary
Sharding is appropriate when:
- Write volume exceeds what a single primary can handle
- Dataset size exceeds what fits in a single instance's storage
- You need data locality for compliance (e.g., EU data stays in EU)
Shard Key Selection
The shard key determines which shard holds a given row. This is the most consequential decision in a sharding architecture.
function getShardId(tenantId: string, totalShards: number): number {
const hash = createHash("sha256").update(tenantId).digest();
return hash.readUInt32BE(0) % totalShards;
}
function getShardConnection(tenantId: string): Pool {
const shardId = getShardId(tenantId, TOTAL_SHARDS);
return shardPools[shardId];
}
// All queries for a tenant go to the same shard
async function getTenantOrders(tenantId: string) {
const pool = getShardConnection(tenantId);
return pool.query("SELECT * FROM orders WHERE tenant_id = $1 ORDER BY created_at DESC", [tenantId]);
}
| Shard Key Strategy | Pros | Cons | |--------------------|------|------| | Tenant ID | All tenant data co-located, simple queries | Hot tenants create hot shards | | Geographic region | Data locality for compliance | Uneven distribution | | Hash of primary key | Even distribution | Cross-shard queries for related data | | Time-based | Natural archival, partition-like benefits | Recent data always hits same shard |
Cross-Shard Queries
The fundamental tradeoff of sharding is that queries spanning multiple shards are expensive and complex. Avoid them by design.
// Bad: requires querying every shard and aggregating
async function getGlobalLeaderboard() {
const results = await Promise.all(
shardPools.map((pool) =>
pool.query("SELECT user_id, score FROM leaderboard ORDER BY score DESC LIMIT 100")
)
);
return results
.flatMap((r) => r.rows)
.sort((a, b) => b.score - a.score)
.slice(0, 100);
}
// Better: maintain a denormalized global leaderboard in a separate database
async function getGlobalLeaderboard() {
return globalDb.query("SELECT user_id, score FROM global_leaderboard ORDER BY score DESC LIMIT 100");
}
Scaling Decision Framework
| Traffic Stage | Strategy | Complexity | Typical Scale | |---------------|----------|------------|---------------| | 0 – 10K users | Query optimization, indexing | Low | Single server | | 10K – 100K | Connection pooling, read replicas | Low-Medium | 1 primary + 2 replicas | | 100K – 1M | Caching layers (Redis), partitioning | Medium | Replicas + cache cluster | | 1M – 10M | Application-level sharding | High | Multiple shard clusters | | 10M+ | Global distribution, CDC, specialized stores | Very High | Multi-region architecture |
Getting Started
Database scaling is a progression, not a destination. Start with the simplest strategy that solves your current bottleneck, measure the impact, and move to the next stage only when needed. Premature optimization — especially premature sharding — creates permanent complexity that slows down every future feature.
If your database is hitting its limits or you need to design a data architecture that will scale with your growth, talk to our team. We design and implement database scaling strategies — from query optimization and caching to read replicas and sharding — so your data layer grows with your business.
Measure first. Scale deliberately. Keep it simple as long as you can.
Explore Related Solutions
Need Help Building Your Project?
From web apps and mobile apps to AI solutions and SaaS platforms — we ship production software for 300+ clients.
Related Articles
API Security Best Practices: A Developer Guide for 2026
A practical guide to securing APIs in production. Covers OAuth 2.0, JWT handling, rate limiting, input validation, CORS configuration, API key management, and security headers with real code examples.
8 min readGraphQL vs REST API: When to Use Each in 2026
A practical comparison of GraphQL and REST for modern applications. Covers over-fetching, the N+1 problem, schema stitching, performance tradeoffs, and clear guidance on when each approach wins.
9 min readMonorepo vs Polyrepo: Choosing the Right Repository Strategy in 2026
A practical comparison of monorepo and polyrepo architectures. Covers Turborepo, Nx, and pnpm workspaces, with guidance on build caching, dependency management, and when to use each approach.