Database Scaling Strategies Guide (2026)

Every successful application eventually hits a database bottleneck. The first sign is usually a slow dashboard query. Then API response times creep up. Then you start seeing connection timeouts during peak traffic. What got you to 10,000 users will not get you to 1,000,000.

Database scaling is not a single decision — it is a sequence of strategies, each appropriate at a different stage of growth. This guide walks through them in order, from the simplest wins to global-scale architectures.

Stage 1: Vertical Scaling and Query Optimization

Before adding infrastructure complexity, extract every bit of performance from your existing setup.

Upgrade the Hardware

Vertical scaling means giving your database server more CPU, RAM, and faster storage. On managed cloud services like AWS RDS or Google Cloud SQL, this is a configuration change.

Resource	Impact	When to Upgrade
RAM	More data cached in memory, fewer disk reads	Buffer pool hit rate below 95%
CPU	Faster query execution, more concurrent connections	CPU consistently above 70%
Storage IOPS	Faster reads and writes for disk-bound workloads	I/O wait times dominating slow queries
Network	Faster replication, faster client responses	Replication lag increasing

Vertical scaling is limited by the largest available machine, but it buys significant runway with zero application changes.

Optimize Your Queries

The highest-ROI work is almost always query optimization. A single missing index can make a query 1000x slower.

-- Find slow queries (PostgreSQL)
SELECT
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percent_total,
  query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Common fixes that yield immediate improvement:

-- Add composite index for a common query pattern
CREATE INDEX CONCURRENTLY idx_orders_user_created
ON orders (user_id, created_at DESC);

-- Partial index for frequently filtered subsets
CREATE INDEX CONCURRENTLY idx_orders_pending
ON orders (created_at DESC)
WHERE status = 'pending';

-- Cover the query entirely to avoid table lookups
CREATE INDEX CONCURRENTLY idx_orders_covering
ON orders (user_id, created_at DESC)
INCLUDE (total_amount, status);

Connection Pooling

Database connections are expensive. Each PostgreSQL connection consumes roughly 10MB of RAM, and creating a new connection takes 20-50ms. Without pooling, a burst of traffic can exhaust connection limits.

// PgBouncer configuration (pgbouncer.ini)
// Sits between your application and PostgreSQL
// [databases]
// mydb = host=localhost port=5432 dbname=mydb
//
// [pgbouncer]
// pool_mode = transaction
// max_client_conn = 1000
// default_pool_size = 25
// reserve_pool_size = 5

Pool modes matter:

Mode	Behavior	Best For
Session	One server connection per client session	Apps using prepared statements, LISTEN/NOTIFY
Transaction	Connection returned after each transaction	Most web applications (recommended default)
Statement	Connection returned after each statement	Simple read-heavy workloads

Transaction mode gives the best connection reuse for typical web applications. It lets 1,000 concurrent application connections share 25 actual database connections.

Stage 2: Read Replicas

Most applications are read-heavy. Dashboards, search, reporting, and content delivery all read far more than they write. Read replicas distribute this load across multiple database instances.

How Read Replicas Work

                    ┌───────────────┐
     Writes ───────▶│   Primary DB   │
                    │  (read/write)  │
                    └───────┬───────┘
                            │ Replication
                    ┌───────┴───────┐
                    │               │
              ┌─────▼─────┐  ┌─────▼─────┐
   Reads ────▶│  Replica 1 │  │  Replica 2 │
              │ (read-only)│  │ (read-only)│
              └───────────┘  └───────────┘

The primary handles all writes. Replicas receive a continuous stream of changes via replication and serve read queries. This effectively multiplies your read capacity by the number of replicas.

Implementing Read/Write Splitting

import { Pool } from "pg";

const primaryPool = new Pool({
  host: "db-primary.internal",
  port: 5432,
  database: "myapp",
  max: 20,
});

const replicaPool = new Pool({
  host: "db-replica.internal",
  port: 5432,
  database: "myapp",
  max: 50,
});

async function query(sql: string, params: unknown[], options?: { readonly?: boolean }) {
  const pool = options?.readonly ? replicaPool : primaryPool;
  return pool.query(sql, params);
}

// Usage
const user = await query("SELECT * FROM users WHERE id = $1", [userId], { readonly: true });
const result = await query("INSERT INTO orders (user_id, total) VALUES ($1, $2) RETURNING *", [userId, total]);

Replication Lag

Replicas are eventually consistent. There is always some delay between a write hitting the primary and appearing on replicas. This lag is typically under 100ms but can spike during heavy write loads.

-- Check replication lag (PostgreSQL)
SELECT
  client_addr,
  state,
  sent_lsn,
  write_lsn,
  replay_lsn,
  round(extract(epoch from replay_lag)::numeric, 3) AS replay_lag_seconds
FROM pg_stat_replication;

Design for eventual consistency: after a user creates an order, read the order back from the primary, not a replica.

async function createOrder(userId: string, items: OrderItem[]) {
  const order = await query(
    "INSERT INTO orders (user_id, total) VALUES ($1, $2) RETURNING *",
    [userId, calculateTotal(items)]
  );
  // Read from primary immediately after write to avoid stale data
  return order.rows[0];
}

async function listOrders(userId: string) {
  // Read from replica — eventual consistency is acceptable for listings
  return query(
    "SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC",
    [userId],
    { readonly: true }
  );
}

Stage 3: Caching Layers

A cache sits between your application and database, serving frequently accessed data from memory instead of disk.

Caching Strategies

Strategy	Description	Best For
Cache-Aside	App checks cache first, falls back to DB, populates cache	General-purpose, most common
Write-Through	App writes to cache and DB simultaneously	Data that is read immediately after write
Write-Behind	App writes to cache, cache async writes to DB	High-write throughput, eventual consistency acceptable
Read-Through	Cache automatically loads from DB on miss	Simplified application code

Cache-Aside with Redis

import Redis from "ioredis";

const redis = new Redis({ host: "redis.internal", port: 6379 });

async function getUserProfile(userId: string): Promise<UserProfile> {
  const cacheKey = `user:${userId}:profile`;

  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  const result = await query(
    "SELECT id, name, email, avatar, role FROM users WHERE id = $1",
    [userId],
    { readonly: true }
  );
  const profile = result.rows[0];

  await redis.set(cacheKey, JSON.stringify(profile), "EX", 300);
  return profile;
}

async function updateUserProfile(userId: string, updates: Partial<UserProfile>) {
  await query(
    "UPDATE users SET name = COALESCE($2, name), avatar = COALESCE($3, avatar) WHERE id = $1",
    [userId, updates.name, updates.avatar]
  );
  // Invalidate cache on write
  await redis.del(`user:${userId}:profile`);
}

Cache Invalidation Patterns

Cache invalidation is genuinely hard. Common approaches:

TTL-based expiry — set a time-to-live on every cache entry. Simple but allows stale data within the TTL window.
Event-driven invalidation — publish cache invalidation events when data changes. More complex but ensures fresher data.
Version-based keys — include a version number in the cache key. Bump the version to invalidate all entries.

// Event-driven invalidation with database triggers
// PostgreSQL LISTEN/NOTIFY for cache invalidation
async function setupCacheInvalidation() {
  const subscriber = new Redis({ host: "redis.internal" });
  const pgListener = new Pool({ host: "db-primary.internal" });

  const client = await pgListener.connect();
  await client.query("LISTEN user_changes");

  client.on("notification", async (msg) => {
    if (msg.channel === "user_changes" && msg.payload) {
      const { user_id } = JSON.parse(msg.payload);
      await redis.del(`user:${user_id}:profile`);
    }
  });
}

Stage 4: Table Partitioning

When individual tables grow into the hundreds of millions or billions of rows, even well-indexed queries slow down. Partitioning splits a table into smaller physical segments while maintaining a single logical table.

Range Partitioning

The most common pattern partitions by date, keeping recent data in hot partitions and historical data in cold partitions.

CREATE TABLE events (
  id bigint GENERATED ALWAYS AS IDENTITY,
  user_id uuid NOT NULL,
  event_type text NOT NULL,
  payload jsonb,
  created_at timestamptz NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_q1 PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');

CREATE TABLE events_2026_q2 PARTITION OF events
  FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');

CREATE TABLE events_2026_q3 PARTITION OF events
  FOR VALUES FROM ('2026-07-01') TO ('2026-10-01');

CREATE TABLE events_2026_q4 PARTITION OF events
  FOR VALUES FROM ('2026-10-01') TO ('2027-01-01');

-- Queries that include created_at in the WHERE clause
-- automatically scan only the relevant partition
SELECT * FROM events
WHERE created_at >= '2026-03-01' AND created_at < '2026-04-01'
  AND user_id = 'abc-123';

Hash Partitioning

When there is no natural range key, hash partitioning distributes rows evenly across partitions.

CREATE TABLE sessions (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id uuid NOT NULL,
  data jsonb,
  expires_at timestamptz NOT NULL
) PARTITION BY HASH (id);

CREATE TABLE sessions_p0 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_p1 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_p2 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_p3 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Partition Maintenance

Automate partition creation and cleanup. Old partitions can be detached and archived instead of running expensive DELETE operations.

-- Drop old data instantly by detaching the partition
ALTER TABLE events DETACH PARTITION events_2024_q1;
DROP TABLE events_2024_q1;

This is orders of magnitude faster than DELETE FROM events WHERE created_at < '2024-04-01', which would generate enormous amounts of WAL and lock the table.

Stage 5: Sharding

Sharding distributes data across multiple independent database instances. Each shard holds a subset of the data, and the application routes queries to the correct shard.

When Sharding Becomes Necessary

Sharding is appropriate when:

Write volume exceeds what a single primary can handle
Dataset size exceeds what fits in a single instance's storage
You need data locality for compliance (e.g., EU data stays in EU)

Shard Key Selection

The shard key determines which shard holds a given row. This is the most consequential decision in a sharding architecture.

function getShardId(tenantId: string, totalShards: number): number {
  const hash = createHash("sha256").update(tenantId).digest();
  return hash.readUInt32BE(0) % totalShards;
}

function getShardConnection(tenantId: string): Pool {
  const shardId = getShardId(tenantId, TOTAL_SHARDS);
  return shardPools[shardId];
}

// All queries for a tenant go to the same shard
async function getTenantOrders(tenantId: string) {
  const pool = getShardConnection(tenantId);
  return pool.query("SELECT * FROM orders WHERE tenant_id = $1 ORDER BY created_at DESC", [tenantId]);
}

Shard Key Strategy	Pros	Cons
Tenant ID	All tenant data co-located, simple queries	Hot tenants create hot shards
Geographic region	Data locality for compliance	Uneven distribution
Hash of primary key	Even distribution	Cross-shard queries for related data
Time-based	Natural archival, partition-like benefits	Recent data always hits same shard

Cross-Shard Queries

The fundamental tradeoff of sharding is that queries spanning multiple shards are expensive and complex. Avoid them by design.

// Bad: requires querying every shard and aggregating
async function getGlobalLeaderboard() {
  const results = await Promise.all(
    shardPools.map((pool) =>
      pool.query("SELECT user_id, score FROM leaderboard ORDER BY score DESC LIMIT 100")
    )
  );
  return results
    .flatMap((r) => r.rows)
    .sort((a, b) => b.score - a.score)
    .slice(0, 100);
}

// Better: maintain a denormalized global leaderboard in a separate database
async function getGlobalLeaderboard() {
  return globalDb.query("SELECT user_id, score FROM global_leaderboard ORDER BY score DESC LIMIT 100");
}

Scaling Decision Framework

Traffic Stage	Strategy	Complexity	Typical Scale
0 – 10K users	Query optimization, indexing	Low	Single server
10K – 100K	Connection pooling, read replicas	Low-Medium	1 primary + 2 replicas
100K – 1M	Caching layers (Redis), partitioning	Medium	Replicas + cache cluster
1M – 10M	Application-level sharding	High	Multiple shard clusters
10M+	Global distribution, CDC, specialized stores	Very High	Multi-region architecture

Getting Started

Database scaling is a progression, not a destination. Start with the simplest strategy that solves your current bottleneck, measure the impact, and move to the next stage only when needed. Premature optimization — especially premature sharding — creates permanent complexity that slows down every future feature.

If your database is hitting its limits or you need to design a data architecture that will scale with your growth, talk to our team. We design and implement database scaling strategies — from query optimization and caching to read replicas and sharding — so your data layer grows with your business.

Measure first. Scale deliberately. Keep it simple as long as you can.

Frequently Asked Questions

When does a single Postgres instance stop scaling?

A well-tuned Postgres on modern hardware typically scales to 10,000-30,000 queries per second and 5-15 TB of data before you need to start thinking about read replicas, partitioning, or sharding. Most applications never hit these limits and run fine on a single primary with one or two read replicas for the life of the product. Premature sharding is one of the most common over-engineering mistakes in database design.

Should we pick Postgres, MySQL, or a NoSQL database?

Postgres is the default choice for new projects in 2026 because of its feature set (JSONB, full-text search, extensions like pgvector, strong consistency) and active development. MySQL remains excellent for high-write, simpler schemas — still the backbone of many large consumer products. NoSQL (DynamoDB, MongoDB, Cassandra) is worth choosing when you have genuinely unbounded scale, simple access patterns, or strict per-request latency needs.

What is the ROI of read replicas versus caching?

Adding a read replica typically costs 40-100% of primary database cost and doubles read throughput; adding Redis or Memcached in front of hot queries often 10-100x reads for 10-20% of database cost. The right pattern is usually cache first, then replicas, because caching hits the ROI sweet spot for the first several scaling bottlenecks.

When is sharding actually necessary?

Sharding becomes justified when a single instance cannot handle write volume (tens of thousands of writes per second sustained) or when the working dataset significantly exceeds available RAM even on the largest instance types. Read-heavy workloads almost never need sharding — they need caching and replicas. Sharding adds 3-5x operational complexity, so the decision should come from measured constraints, not hypothetical future scale.

Stage 1: Vertical Scaling and Query Optimization

Before adding infrastructure complexity, extract every bit of performance from your existing setup.

Upgrade the Hardware

Vertical scaling means giving your database server more CPU, RAM, and faster storage. On managed cloud services like AWS RDS or Google Cloud SQL, this is a configuration change.

Resource	Impact	When to Upgrade
RAM	More data cached in memory, fewer disk reads	Buffer pool hit rate below 95%
CPU	Faster query execution, more concurrent connections	CPU consistently above 70%
Storage IOPS	Faster reads and writes for disk-bound workloads	I/O wait times dominating slow queries
Network	Faster replication, faster client responses	Replication lag increasing

Vertical scaling is limited by the largest available machine, but it buys significant runway with zero application changes.

Optimize Your Queries

The highest-ROI work is almost always query optimization. A single missing index can make a query 1000x slower.

-- Find slow queries (PostgreSQL)
SELECT
  calls,
  round(total_exec_time::numeric, 2) AS total_ms,
  round(mean_exec_time::numeric, 2) AS avg_ms,
  round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percent_total,
  query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Common fixes that yield immediate improvement:

-- Add composite index for a common query pattern
CREATE INDEX CONCURRENTLY idx_orders_user_created
ON orders (user_id, created_at DESC);

-- Partial index for frequently filtered subsets
CREATE INDEX CONCURRENTLY idx_orders_pending
ON orders (created_at DESC)
WHERE status = 'pending';

-- Cover the query entirely to avoid table lookups
CREATE INDEX CONCURRENTLY idx_orders_covering
ON orders (user_id, created_at DESC)
INCLUDE (total_amount, status);

Connection Pooling

// PgBouncer configuration (pgbouncer.ini)
// Sits between your application and PostgreSQL
// [databases]
// mydb = host=localhost port=5432 dbname=mydb
//
// [pgbouncer]
// pool_mode = transaction
// max_client_conn = 1000
// default_pool_size = 25
// reserve_pool_size = 5

Pool modes matter:

Mode	Behavior	Best For
Session	One server connection per client session	Apps using prepared statements, LISTEN/NOTIFY
Transaction	Connection returned after each transaction	Most web applications (recommended default)
Statement	Connection returned after each statement	Simple read-heavy workloads

Transaction mode gives the best connection reuse for typical web applications. It lets 1,000 concurrent application connections share 25 actual database connections.

Stage 2: Read Replicas

Most applications are read-heavy. Dashboards, search, reporting, and content delivery all read far more than they write. Read replicas distribute this load across multiple database instances.

How Read Replicas Work

                    ┌───────────────┐
     Writes ───────▶│   Primary DB   │
                    │  (read/write)  │
                    └───────┬───────┘
                            │ Replication
                    ┌───────┴───────┐
                    │               │
              ┌─────▼─────┐  ┌─────▼─────┐
   Reads ────▶│  Replica 1 │  │  Replica 2 │
              │ (read-only)│  │ (read-only)│
              └───────────┘  └───────────┘

The primary handles all writes. Replicas receive a continuous stream of changes via replication and serve read queries. This effectively multiplies your read capacity by the number of replicas.

Implementing Read/Write Splitting

import { Pool } from "pg";

const primaryPool = new Pool({
  host: "db-primary.internal",
  port: 5432,
  database: "myapp",
  max: 20,
});

const replicaPool = new Pool({
  host: "db-replica.internal",
  port: 5432,
  database: "myapp",
  max: 50,
});

async function query(sql: string, params: unknown[], options?: { readonly?: boolean }) {
  const pool = options?.readonly ? replicaPool : primaryPool;
  return pool.query(sql, params);
}

// Usage
const user = await query("SELECT * FROM users WHERE id = $1", [userId], { readonly: true });
const result = await query("INSERT INTO orders (user_id, total) VALUES ($1, $2) RETURNING *", [userId, total]);

Replication Lag

Replicas are eventually consistent. There is always some delay between a write hitting the primary and appearing on replicas. This lag is typically under 100ms but can spike during heavy write loads.

-- Check replication lag (PostgreSQL)
SELECT
  client_addr,
  state,
  sent_lsn,
  write_lsn,
  replay_lsn,
  round(extract(epoch from replay_lag)::numeric, 3) AS replay_lag_seconds
FROM pg_stat_replication;

Design for eventual consistency: after a user creates an order, read the order back from the primary, not a replica.

async function createOrder(userId: string, items: OrderItem[]) {
  const order = await query(
    "INSERT INTO orders (user_id, total) VALUES ($1, $2) RETURNING *",
    [userId, calculateTotal(items)]
  );
  // Read from primary immediately after write to avoid stale data
  return order.rows[0];
}

async function listOrders(userId: string) {
  // Read from replica — eventual consistency is acceptable for listings
  return query(
    "SELECT * FROM orders WHERE user_id = $1 ORDER BY created_at DESC",
    [userId],
    { readonly: true }
  );
}

Stage 3: Caching Layers

A cache sits between your application and database, serving frequently accessed data from memory instead of disk.

Caching Strategies

Strategy	Description	Best For
Cache-Aside	App checks cache first, falls back to DB, populates cache	General-purpose, most common
Write-Through	App writes to cache and DB simultaneously	Data that is read immediately after write
Write-Behind	App writes to cache, cache async writes to DB	High-write throughput, eventual consistency acceptable
Read-Through	Cache automatically loads from DB on miss	Simplified application code

Cache-Aside with Redis

import Redis from "ioredis";

const redis = new Redis({ host: "redis.internal", port: 6379 });

async function getUserProfile(userId: string): Promise<UserProfile> {
  const cacheKey = `user:${userId}:profile`;

  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  const result = await query(
    "SELECT id, name, email, avatar, role FROM users WHERE id = $1",
    [userId],
    { readonly: true }
  );
  const profile = result.rows[0];

  await redis.set(cacheKey, JSON.stringify(profile), "EX", 300);
  return profile;
}

async function updateUserProfile(userId: string, updates: Partial<UserProfile>) {
  await query(
    "UPDATE users SET name = COALESCE($2, name), avatar = COALESCE($3, avatar) WHERE id = $1",
    [userId, updates.name, updates.avatar]
  );
  // Invalidate cache on write
  await redis.del(`user:${userId}:profile`);
}

Cache Invalidation Patterns

Cache invalidation is genuinely hard. Common approaches:

TTL-based expiry — set a time-to-live on every cache entry. Simple but allows stale data within the TTL window.
Event-driven invalidation — publish cache invalidation events when data changes. More complex but ensures fresher data.
Version-based keys — include a version number in the cache key. Bump the version to invalidate all entries.

// Event-driven invalidation with database triggers
// PostgreSQL LISTEN/NOTIFY for cache invalidation
async function setupCacheInvalidation() {
  const subscriber = new Redis({ host: "redis.internal" });
  const pgListener = new Pool({ host: "db-primary.internal" });

  const client = await pgListener.connect();
  await client.query("LISTEN user_changes");

  client.on("notification", async (msg) => {
    if (msg.channel === "user_changes" && msg.payload) {
      const { user_id } = JSON.parse(msg.payload);
      await redis.del(`user:${user_id}:profile`);
    }
  });
}

Stage 4: Table Partitioning

Range Partitioning

The most common pattern partitions by date, keeping recent data in hot partitions and historical data in cold partitions.

CREATE TABLE events (
  id bigint GENERATED ALWAYS AS IDENTITY,
  user_id uuid NOT NULL,
  event_type text NOT NULL,
  payload jsonb,
  created_at timestamptz NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_q1 PARTITION OF events
  FOR VALUES FROM ('2026-01-01') TO ('2026-04-01');

CREATE TABLE events_2026_q2 PARTITION OF events
  FOR VALUES FROM ('2026-04-01') TO ('2026-07-01');

CREATE TABLE events_2026_q3 PARTITION OF events
  FOR VALUES FROM ('2026-07-01') TO ('2026-10-01');

CREATE TABLE events_2026_q4 PARTITION OF events
  FOR VALUES FROM ('2026-10-01') TO ('2027-01-01');

-- Queries that include created_at in the WHERE clause
-- automatically scan only the relevant partition
SELECT * FROM events
WHERE created_at >= '2026-03-01' AND created_at < '2026-04-01'
  AND user_id = 'abc-123';

Hash Partitioning

When there is no natural range key, hash partitioning distributes rows evenly across partitions.

CREATE TABLE sessions (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id uuid NOT NULL,
  data jsonb,
  expires_at timestamptz NOT NULL
) PARTITION BY HASH (id);

CREATE TABLE sessions_p0 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_p1 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_p2 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_p3 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Partition Maintenance

Automate partition creation and cleanup. Old partitions can be detached and archived instead of running expensive DELETE operations.

-- Drop old data instantly by detaching the partition
ALTER TABLE events DETACH PARTITION events_2024_q1;
DROP TABLE events_2024_q1;

This is orders of magnitude faster than DELETE FROM events WHERE created_at < '2024-04-01', which would generate enormous amounts of WAL and lock the table.

Stage 5: Sharding

Sharding distributes data across multiple independent database instances. Each shard holds a subset of the data, and the application routes queries to the correct shard.

When Sharding Becomes Necessary

Sharding is appropriate when:

Write volume exceeds what a single primary can handle
Dataset size exceeds what fits in a single instance's storage
You need data locality for compliance (e.g., EU data stays in EU)

Shard Key Selection

The shard key determines which shard holds a given row. This is the most consequential decision in a sharding architecture.

function getShardId(tenantId: string, totalShards: number): number {
  const hash = createHash("sha256").update(tenantId).digest();
  return hash.readUInt32BE(0) % totalShards;
}

function getShardConnection(tenantId: string): Pool {
  const shardId = getShardId(tenantId, TOTAL_SHARDS);
  return shardPools[shardId];
}

// All queries for a tenant go to the same shard
async function getTenantOrders(tenantId: string) {
  const pool = getShardConnection(tenantId);
  return pool.query("SELECT * FROM orders WHERE tenant_id = $1 ORDER BY created_at DESC", [tenantId]);
}

Shard Key Strategy	Pros	Cons
Tenant ID	All tenant data co-located, simple queries	Hot tenants create hot shards
Geographic region	Data locality for compliance	Uneven distribution
Hash of primary key	Even distribution	Cross-shard queries for related data
Time-based	Natural archival, partition-like benefits	Recent data always hits same shard

Cross-Shard Queries

The fundamental tradeoff of sharding is that queries spanning multiple shards are expensive and complex. Avoid them by design.

// Bad: requires querying every shard and aggregating
async function getGlobalLeaderboard() {
  const results = await Promise.all(
    shardPools.map((pool) =>
      pool.query("SELECT user_id, score FROM leaderboard ORDER BY score DESC LIMIT 100")
    )
  );
  return results
    .flatMap((r) => r.rows)
    .sort((a, b) => b.score - a.score)
    .slice(0, 100);
}

// Better: maintain a denormalized global leaderboard in a separate database
async function getGlobalLeaderboard() {
  return globalDb.query("SELECT user_id, score FROM global_leaderboard ORDER BY score DESC LIMIT 100");
}

Scaling Decision Framework

Traffic Stage	Strategy	Complexity	Typical Scale
0 – 10K users	Query optimization, indexing	Low	Single server
10K – 100K	Connection pooling, read replicas	Low-Medium	1 primary + 2 replicas
100K – 1M	Caching layers (Redis), partitioning	Medium	Replicas + cache cluster
1M – 10M	Application-level sharding	High	Multiple shard clusters
10M+	Global distribution, CDC, specialized stores	Very High	Multi-region architecture

Stage 1: Vertical Scaling and Query Optimization

Upgrade the Hardware

Optimize Your Queries

Connection Pooling

Stage 2: Read Replicas

How Read Replicas Work

Implementing Read/Write Splitting

Replication Lag

Stage 3: Caching Layers

Caching Strategies

Cache-Aside with Redis

Cache Invalidation Patterns

Stage 4: Table Partitioning

Range Partitioning

Hash Partitioning

Partition Maintenance

Stage 5: Sharding

When Sharding Becomes Necessary

Shard Key Selection

Cross-Shard Queries

Scaling Decision Framework

Getting Started

Frequently Asked Questions

When does a single Postgres instance stop scaling?

Should we pick Postgres, MySQL, or a NoSQL database?

What is the ROI of read replicas versus caching?

When is sharding actually necessary?

Explore Related Solutions

Need Help Building Your Project?

Related Articles

Web Performance Optimization in 2026: A Complete Guide to Core Web Vitals

TypeScript Design Patterns: A Practical Guide for 2026

Testing Strategies for Modern Web Apps: Vitest, Playwright, and Beyond

Stage 1: Vertical Scaling and Query Optimization

Upgrade the Hardware

Optimize Your Queries

Connection Pooling

Stage 2: Read Replicas

How Read Replicas Work

Implementing Read/Write Splitting

Replication Lag

Stage 3: Caching Layers

Caching Strategies

Cache-Aside with Redis

Cache Invalidation Patterns

Stage 4: Table Partitioning

Range Partitioning

Hash Partitioning

Partition Maintenance

Stage 5: Sharding

When Sharding Becomes Necessary

Shard Key Selection

Cross-Shard Queries

Scaling Decision Framework

Getting Started

Frequently Asked Questions

When does a single Postgres instance stop scaling?

Should we pick Postgres, MySQL, or a NoSQL database?

What is the ROI of read replicas versus caching?

When is sharding actually necessary?

Explore Related Solutions

Need Help Building Your Project?

Related Articles

Web Performance Optimization in 2026: A Complete Guide to Core Web Vitals

TypeScript Design Patterns: A Practical Guide for 2026

Testing Strategies for Modern Web Apps: Vitest, Playwright, and Beyond