Initial commit: add all skills files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
706
fullstack-dev/references/db-schema.md
Normal file
706
fullstack-dev/references/db-schema.md
Normal file
@@ -0,0 +1,706 @@
|
||||
---
|
||||
name: fullstack-dev-db-schema
|
||||
description: "Database schema design and migrations. Use when creating tables, defining ORM models, adding indexes, or designing relationships. Covers zero-downtime migrations and multi-tenancy."
|
||||
license: MIT
|
||||
metadata:
|
||||
version: "1.0.0"
|
||||
sources:
|
||||
- PostgreSQL official documentation
|
||||
- Use The Index, Luke (use-the-index-luke.com)
|
||||
- Designing Data-Intensive Applications (Martin Kleppmann)
|
||||
- Database Reliability Engineering (Laine Campbell & Charity Majors)
|
||||
---
|
||||
|
||||
# Database Schema Design
|
||||
|
||||
ORM-agnostic guide for relational database schema design. Covers data modeling, normalization, indexing, migrations, multi-tenancy, and common application patterns. Primarily PostgreSQL-focused but principles apply to MySQL/MariaDB.
|
||||
|
||||
## Scope
|
||||
|
||||
**USE this skill when:**
|
||||
- Designing a schema for a new project or feature
|
||||
- Deciding between normalization and denormalization
|
||||
- Choosing which indexes to create
|
||||
- Planning a zero-downtime migration on a live database
|
||||
- Implementing multi-tenant data isolation
|
||||
- Adding audit trails, soft delete, or versioning
|
||||
- Diagnosing slow queries caused by schema problems
|
||||
|
||||
**NOT for:**
|
||||
- Choosing which database technology to use (→ `technology-selection`)
|
||||
- PostgreSQL-specific query tuning (use PostgreSQL performance docs)
|
||||
- ORM-specific configuration (→ `django-best-practices` or your ORM's docs)
|
||||
- Application-layer caching (→ `fullstack-dev-practices`)
|
||||
|
||||
## Context Required
|
||||
|
||||
| Required | Optional |
|
||||
|----------|----------|
|
||||
| Database engine (PostgreSQL / MySQL) | Expected data volume (rows, growth rate) |
|
||||
| Domain entities and relationships | Read/write ratio |
|
||||
| Key access patterns (queries) | Multi-tenant requirements |
|
||||
|
||||
---
|
||||
|
||||
## Quick Start Checklist
|
||||
|
||||
Designing a new schema:
|
||||
|
||||
- [ ] **Domain entities identified** — map 1 entity = 1 table (not 1 class = 1 table)
|
||||
- [ ] **Primary keys**: UUID for public IDs, serial/bigserial for internal-only
|
||||
- [ ] **Foreign keys** with explicit `ON DELETE` behavior
|
||||
- [ ] **NOT NULL** by default — nullable only when business logic requires it
|
||||
- [ ] **Timestamps**: `created_at` + `updated_at` on every table
|
||||
- [ ] **Indexes** created for every WHERE, JOIN, ORDER BY column
|
||||
- [ ] **No premature denormalization** — start normalized, denormalize when measured
|
||||
- [ ] **Naming convention** consistent: `snake_case`, plural table names
|
||||
|
||||
---
|
||||
|
||||
## Quick Navigation
|
||||
|
||||
| Need to… | Jump to |
|
||||
|----------|---------|
|
||||
| Model entities and relationships | [1. Data Modeling](#1-data-modeling-critical) |
|
||||
| Decide normalize vs denormalize | [2. Normalization](#2-normalization-vs-denormalization-critical) |
|
||||
| Choose the right index | [3. Indexing](#3-indexing-strategy-critical) |
|
||||
| Run migrations safely on live DB | [4. Migrations](#4-zero-downtime-migrations-high) |
|
||||
| Design multi-tenant schema | [5. Multi-Tenancy](#5-multi-tenant-design-high) |
|
||||
| Add soft delete / audit trails | [6. Common Patterns](#6-common-schema-patterns-medium) |
|
||||
| Partition large tables | [7. Partitioning](#7-table-partitioning-medium) |
|
||||
| See anti-patterns | [Anti-Patterns](#anti-patterns) |
|
||||
|
||||
---
|
||||
|
||||
## Core Principles (7 Rules)
|
||||
|
||||
```
|
||||
1. ✅ Start normalized (3NF) — denormalize only when you have measured evidence
|
||||
2. ✅ Every table has a primary key, created_at, updated_at
|
||||
3. ✅ UUID for public-facing IDs, serial for internal join keys
|
||||
4. ✅ NOT NULL by default — null is a business decision, not a lazy default
|
||||
5. ✅ Index every column used in WHERE, JOIN, ORDER BY
|
||||
6. ✅ Foreign keys enforced in database (not just application code)
|
||||
7. ✅ Migrations are additive — never drop/rename in production without a multi-step plan
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 1. Data Modeling (CRITICAL)
|
||||
|
||||
### Table Naming
|
||||
|
||||
```sql
|
||||
-- ✅ Plural, snake_case
|
||||
CREATE TABLE orders (...);
|
||||
CREATE TABLE order_items (...);
|
||||
CREATE TABLE user_profiles (...);
|
||||
|
||||
-- ❌ Singular, mixed case
|
||||
CREATE TABLE Order (...);
|
||||
CREATE TABLE OrderItem (...);
|
||||
CREATE TABLE tbl_usr_prof (...); -- cryptic abbreviation
|
||||
```
|
||||
|
||||
### Primary Keys
|
||||
|
||||
| Strategy | When | Pros | Cons |
|
||||
|----------|------|------|------|
|
||||
| `bigserial` (auto-increment) | Internal tables, FK joins | Compact, fast joins | Enumerable, not safe for public IDs |
|
||||
| `uuid` (v4 random) | Public-facing resources | Non-guessable, globally unique | Larger (16 bytes), random I/O on B-Tree |
|
||||
| `uuid` v7 (time-sorted) | Public + needs ordering | Non-guessable + insert-friendly | Newer, less ecosystem support |
|
||||
| `text` slug | URL-friendly resources | Human-readable | Must enforce uniqueness, updates expensive |
|
||||
|
||||
**Recommended default:**
|
||||
|
||||
```sql
|
||||
CREATE TABLE orders (
|
||||
id bigserial PRIMARY KEY, -- internal FK target
|
||||
public_id uuid NOT NULL DEFAULT gen_random_uuid() UNIQUE, -- API-facing
|
||||
-- ...
|
||||
created_at timestamptz NOT NULL DEFAULT now(),
|
||||
updated_at timestamptz NOT NULL DEFAULT now()
|
||||
);
|
||||
```
|
||||
|
||||
### Relationships
|
||||
|
||||
```sql
|
||||
-- One-to-Many: user → orders
|
||||
CREATE TABLE orders (
|
||||
id bigserial PRIMARY KEY,
|
||||
user_id bigint NOT NULL REFERENCES users(id) ON DELETE CASCADE,
|
||||
-- ...
|
||||
);
|
||||
CREATE INDEX idx_orders_user_id ON orders(user_id);
|
||||
|
||||
-- Many-to-Many: orders ↔ products (via junction table)
|
||||
CREATE TABLE order_items (
|
||||
id bigserial PRIMARY KEY,
|
||||
order_id bigint NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
|
||||
product_id bigint NOT NULL REFERENCES products(id) ON DELETE RESTRICT,
|
||||
quantity int NOT NULL CHECK (quantity > 0),
|
||||
unit_price numeric(10,2) NOT NULL,
|
||||
UNIQUE (order_id, product_id) -- prevent duplicate line items
|
||||
);
|
||||
|
||||
-- One-to-One: user → profile
|
||||
CREATE TABLE user_profiles (
|
||||
user_id bigint PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
|
||||
bio text,
|
||||
avatar_url text,
|
||||
-- ...
|
||||
);
|
||||
```
|
||||
|
||||
### ON DELETE Behavior
|
||||
|
||||
| Behavior | When | Example |
|
||||
|----------|------|---------|
|
||||
| `CASCADE` | Child meaningless without parent | order_items when order deleted |
|
||||
| `RESTRICT` | Prevent accidental deletion | products referenced by order_items |
|
||||
| `SET NULL` | Preserve child, clear reference | orders.assigned_to when employee leaves |
|
||||
| `SET DEFAULT` | Fallback to default value | Rare, for status columns |
|
||||
|
||||
---
|
||||
|
||||
## 2. Normalization vs Denormalization (CRITICAL)
|
||||
|
||||
### Start Normalized (3NF)
|
||||
|
||||
**Normal forms in practice:**
|
||||
|
||||
| Form | Rule | Example Violation |
|
||||
|------|------|-------------------|
|
||||
| 1NF | No repeating groups, atomic values | `tags = "go,python,rust"` in one column |
|
||||
| 2NF | No partial dependencies (composite keys) | `order_items.product_name` depends on `product_id` alone |
|
||||
| 3NF | No transitive dependencies | `orders.customer_city` depends on `customer_id`, not `order_id` |
|
||||
|
||||
**1NF violation fix:**
|
||||
```sql
|
||||
-- ❌ Tags as comma-separated string
|
||||
CREATE TABLE posts (id serial, tags text); -- tags = "go,python"
|
||||
|
||||
-- ✅ Separate table (or array/JSONB if simple)
|
||||
CREATE TABLE post_tags (
|
||||
post_id bigint REFERENCES posts(id) ON DELETE CASCADE,
|
||||
tag_id bigint REFERENCES tags(id) ON DELETE CASCADE,
|
||||
PRIMARY KEY (post_id, tag_id)
|
||||
);
|
||||
|
||||
-- ✅ Alternative: PostgreSQL array (if tags are just strings, no metadata)
|
||||
CREATE TABLE posts (id serial, tags text[] NOT NULL DEFAULT '{}');
|
||||
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);
|
||||
```
|
||||
|
||||
### When to Denormalize
|
||||
|
||||
**Denormalize ONLY when:**
|
||||
1. You have **measured** a performance problem (EXPLAIN ANALYZE, not "I think it's slow")
|
||||
2. The denormalized data is **read-heavy** (read:write ratio > 100:1)
|
||||
3. You accept the **consistency maintenance cost** (triggers, application logic, or materialized views)
|
||||
|
||||
**Safe denormalization patterns:**
|
||||
|
||||
```sql
|
||||
-- Pattern 1: Materialized view (computed, refreshable)
|
||||
CREATE MATERIALIZED VIEW order_summary AS
|
||||
SELECT o.id, o.user_id, o.total,
|
||||
COUNT(oi.id) AS item_count,
|
||||
u.email AS user_email
|
||||
FROM orders o
|
||||
JOIN order_items oi ON oi.order_id = o.id
|
||||
JOIN users u ON u.id = o.user_id
|
||||
GROUP BY o.id, u.email;
|
||||
|
||||
REFRESH MATERIALIZED VIEW CONCURRENTLY order_summary; -- non-blocking
|
||||
|
||||
-- Pattern 2: Cached aggregate column (application-maintained)
|
||||
ALTER TABLE orders ADD COLUMN item_count int NOT NULL DEFAULT 0;
|
||||
-- Update via trigger or application code on order_item insert/delete
|
||||
|
||||
-- Pattern 3: JSONB snapshot (freeze-at-write-time)
|
||||
-- Store a copy of the product details at the time of purchase
|
||||
CREATE TABLE order_items (
|
||||
id bigserial PRIMARY KEY,
|
||||
order_id bigint NOT NULL REFERENCES orders(id),
|
||||
product_id bigint REFERENCES products(id),
|
||||
quantity int NOT NULL,
|
||||
unit_price numeric(10,2) NOT NULL, -- frozen price
|
||||
product_snapshot jsonb NOT NULL -- frozen name, description, image
|
||||
);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Indexing Strategy (CRITICAL)
|
||||
|
||||
### Index Types (PostgreSQL)
|
||||
|
||||
| Type | When | Example |
|
||||
|------|------|---------|
|
||||
| **B-Tree** (default) | Equality, range, ORDER BY | `WHERE status = 'active'`, `WHERE created_at > '2025-01-01'` |
|
||||
| **Hash** | Equality only (rare, B-Tree usually better) | `WHERE id = 123` (large tables, Postgres 10+) |
|
||||
| **GIN** | Arrays, JSONB, full-text search | `WHERE tags @> '{go}'`, `WHERE data->>'key' = 'val'` |
|
||||
| **GiST** | Geometry, ranges, nearest-neighbor | PostGIS, tsrange, ltree |
|
||||
| **BRIN** | Very large tables with natural ordering | Time-series data sorted by timestamp |
|
||||
|
||||
### Index Decision Rules
|
||||
|
||||
```
|
||||
Rule 1: Index every column in WHERE clauses
|
||||
Rule 2: Index every column used in JOIN ON conditions
|
||||
Rule 3: Index every column in ORDER BY (if queried with LIMIT)
|
||||
Rule 4: Composite index for multi-column WHERE (leftmost prefix rule)
|
||||
Rule 5: Partial index when filtering a subset (e.g., only active records)
|
||||
Rule 6: Covering index (INCLUDE) to avoid table lookup
|
||||
Rule 7: DON'T index low-cardinality columns alone (e.g., boolean)
|
||||
```
|
||||
|
||||
### Composite Index: Column Order Matters
|
||||
|
||||
```sql
|
||||
-- Query: WHERE user_id = ? AND status = ? ORDER BY created_at DESC
|
||||
-- ✅ Optimal: matches query pattern left-to-right
|
||||
CREATE INDEX idx_orders_user_status_created
|
||||
ON orders(user_id, status, created_at DESC);
|
||||
|
||||
-- ❌ Wrong order: can't use for this query efficiently
|
||||
CREATE INDEX idx_orders_created_user_status
|
||||
ON orders(created_at DESC, user_id, status);
|
||||
```
|
||||
|
||||
**Leftmost prefix rule:** Index on `(A, B, C)` supports queries on `(A)`, `(A, B)`, `(A, B, C)` but NOT `(B)`, `(C)`, or `(B, C)`.
|
||||
|
||||
### Partial Index (Index Only What Matters)
|
||||
|
||||
```sql
|
||||
-- Only 5% of orders are 'pending', but queried frequently
|
||||
CREATE INDEX idx_orders_pending
|
||||
ON orders(created_at DESC)
|
||||
WHERE status = 'pending';
|
||||
|
||||
-- Only active users matter for login
|
||||
CREATE INDEX idx_users_active_email
|
||||
ON users(email)
|
||||
WHERE is_active = true;
|
||||
```
|
||||
|
||||
### Covering Index (Avoid Table Lookup)
|
||||
|
||||
```sql
|
||||
-- Query only needs id and status, no need to read the table row
|
||||
CREATE INDEX idx_orders_user_covering
|
||||
ON orders(user_id) INCLUDE (status, total);
|
||||
|
||||
-- Now this query is index-only:
|
||||
SELECT status, total FROM orders WHERE user_id = 123;
|
||||
```
|
||||
|
||||
### When NOT to Index
|
||||
|
||||
```
|
||||
❌ Columns rarely used in WHERE/JOIN/ORDER BY
|
||||
❌ Tables with < 1,000 rows (sequential scan is faster)
|
||||
❌ Columns with very low cardinality alone (e.g., boolean is_active)
|
||||
❌ Write-heavy tables where index maintenance cost > read benefit
|
||||
❌ Duplicate indexes (check pg_stat_user_indexes for unused indexes)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Zero-Downtime Migrations (HIGH)
|
||||
|
||||
### The Golden Rule
|
||||
|
||||
```
|
||||
NEVER make destructive changes in one step.
|
||||
Always: ADD → MIGRATE DATA → REMOVE OLD (in separate deploys).
|
||||
```
|
||||
|
||||
### Safe Migration Patterns
|
||||
|
||||
**Rename a column (3 deploys):**
|
||||
|
||||
```
|
||||
Deploy 1: Add new column
|
||||
ALTER TABLE users ADD COLUMN full_name text;
|
||||
UPDATE users SET full_name = name; -- backfill
|
||||
-- App writes to BOTH name and full_name
|
||||
|
||||
Deploy 2: Switch reads to new column
|
||||
-- App reads from full_name, still writes to both
|
||||
|
||||
Deploy 3: Drop old column
|
||||
ALTER TABLE users DROP COLUMN name;
|
||||
-- App only uses full_name
|
||||
```
|
||||
|
||||
**Add a NOT NULL column (2 deploys):**
|
||||
|
||||
```sql
|
||||
-- Deploy 1: Add nullable column, backfill
|
||||
ALTER TABLE orders ADD COLUMN currency text; -- nullable first
|
||||
UPDATE orders SET currency = 'USD' WHERE currency IS NULL; -- backfill
|
||||
|
||||
-- Deploy 2: Add constraint (after all rows backfilled)
|
||||
ALTER TABLE orders ALTER COLUMN currency SET NOT NULL;
|
||||
ALTER TABLE orders ALTER COLUMN currency SET DEFAULT 'USD';
|
||||
```
|
||||
|
||||
**Add an index without locking:**
|
||||
|
||||
```sql
|
||||
-- ✅ CONCURRENTLY: no table lock, can run on live DB
|
||||
CREATE INDEX CONCURRENTLY idx_orders_status ON orders(status);
|
||||
|
||||
-- ❌ Without CONCURRENTLY: locks table for writes during build
|
||||
CREATE INDEX idx_orders_status ON orders(status);
|
||||
```
|
||||
|
||||
### Migration Safety Checklist
|
||||
|
||||
```
|
||||
✅ Migration runs in < 30 seconds on production data size
|
||||
✅ No exclusive table locks (use CONCURRENTLY for indexes)
|
||||
✅ Rollback plan documented and tested
|
||||
✅ Backfill runs in batches (not one giant UPDATE)
|
||||
✅ New column added as nullable first, constraint added later
|
||||
✅ Old column kept until all code references removed
|
||||
|
||||
❌ Never rename/drop columns in one deploy
|
||||
❌ Never ALTER TYPE on large tables without testing timing
|
||||
❌ Never run data backfill in a transaction (OOM on large tables)
|
||||
```
|
||||
|
||||
### Batch Backfill Template
|
||||
|
||||
```sql
|
||||
-- Backfill in batches of 10,000 (avoids long-running transactions)
|
||||
DO $$
|
||||
DECLARE
|
||||
batch_size int := 10000;
|
||||
affected int;
|
||||
BEGIN
|
||||
LOOP
|
||||
UPDATE orders
|
||||
SET currency = 'USD'
|
||||
WHERE id IN (
|
||||
SELECT id FROM orders WHERE currency IS NULL LIMIT batch_size
|
||||
);
|
||||
GET DIAGNOSTICS affected = ROW_COUNT;
|
||||
RAISE NOTICE 'Updated % rows', affected;
|
||||
EXIT WHEN affected = 0;
|
||||
PERFORM pg_sleep(0.1); -- brief pause to reduce load
|
||||
END LOOP;
|
||||
END $$;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Multi-Tenant Design (HIGH)
|
||||
|
||||
### Three Approaches
|
||||
|
||||
| Approach | Isolation | Complexity | When |
|
||||
|----------|-----------|------------|------|
|
||||
| **Row-level** (shared tables + `tenant_id`) | Low | Low | SaaS MVP, < 1,000 tenants |
|
||||
| **Schema-per-tenant** | Medium | Medium | Regulated industries, moderate scale |
|
||||
| **Database-per-tenant** | High | High | Enterprise, strict data isolation |
|
||||
|
||||
### Row-Level Tenancy (Most Common)
|
||||
|
||||
```sql
|
||||
-- Every table has tenant_id
|
||||
CREATE TABLE orders (
|
||||
id bigserial PRIMARY KEY,
|
||||
tenant_id bigint NOT NULL REFERENCES tenants(id),
|
||||
user_id bigint NOT NULL REFERENCES users(id),
|
||||
total numeric(10,2) NOT NULL,
|
||||
-- ...
|
||||
);
|
||||
|
||||
-- Composite index: tenant first (most queries filter by tenant)
|
||||
CREATE INDEX idx_orders_tenant_user ON orders(tenant_id, user_id);
|
||||
CREATE INDEX idx_orders_tenant_status ON orders(tenant_id, status);
|
||||
|
||||
-- Row-Level Security (PostgreSQL)
|
||||
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
|
||||
CREATE POLICY tenant_isolation ON orders
|
||||
USING (tenant_id = current_setting('app.tenant_id')::bigint);
|
||||
```
|
||||
|
||||
**Application-level enforcement:**
|
||||
|
||||
```typescript
|
||||
// Middleware: set tenant context on every request
|
||||
app.use((req, res, next) => {
|
||||
const tenantId = req.headers['x-tenant-id'];
|
||||
if (!tenantId) return res.status(400).json({ error: 'Missing tenant' });
|
||||
req.tenantId = tenantId;
|
||||
next();
|
||||
});
|
||||
|
||||
// Repository: ALWAYS filter by tenant
|
||||
async findOrders(tenantId: string, userId: string) {
|
||||
return db.order.findMany({
|
||||
where: { tenantId, userId }, // ← tenant_id in EVERY query
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
### Rules
|
||||
|
||||
```
|
||||
✅ tenant_id in EVERY table that holds tenant data
|
||||
✅ tenant_id as FIRST column in every composite index
|
||||
✅ Application middleware enforces tenant context
|
||||
✅ Use RLS (PostgreSQL) as defense-in-depth, not sole protection
|
||||
✅ Test with 2+ tenants to verify isolation
|
||||
|
||||
❌ Never allow cross-tenant queries in application code
|
||||
❌ Never skip tenant_id in WHERE clauses (even in admin tools)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Common Schema Patterns (MEDIUM)
|
||||
|
||||
### Soft Delete
|
||||
|
||||
```sql
|
||||
ALTER TABLE orders ADD COLUMN deleted_at timestamptz;
|
||||
|
||||
-- All queries filter deleted records
|
||||
CREATE VIEW active_orders AS
|
||||
SELECT * FROM orders WHERE deleted_at IS NULL;
|
||||
|
||||
-- Partial index: only index non-deleted rows
|
||||
CREATE INDEX idx_orders_active_status
|
||||
ON orders(status, created_at DESC)
|
||||
WHERE deleted_at IS NULL;
|
||||
```
|
||||
|
||||
**ORM integration:**
|
||||
|
||||
```typescript
|
||||
// Prisma middleware: auto-filter soft-deleted records
|
||||
prisma.$use(async (params, next) => {
|
||||
if (params.action === 'findMany' || params.action === 'findFirst') {
|
||||
params.args.where = { ...params.args.where, deletedAt: null };
|
||||
}
|
||||
return next(params);
|
||||
});
|
||||
```
|
||||
|
||||
### Audit Trail
|
||||
|
||||
```sql
|
||||
-- Option A: Audit columns on every table
|
||||
ALTER TABLE orders ADD COLUMN created_by bigint REFERENCES users(id);
|
||||
ALTER TABLE orders ADD COLUMN updated_by bigint REFERENCES users(id);
|
||||
|
||||
-- Option B: Separate audit log table (more detail)
|
||||
CREATE TABLE audit_log (
|
||||
id bigserial PRIMARY KEY,
|
||||
table_name text NOT NULL,
|
||||
record_id bigint NOT NULL,
|
||||
action text NOT NULL CHECK (action IN ('INSERT', 'UPDATE', 'DELETE')),
|
||||
old_data jsonb,
|
||||
new_data jsonb,
|
||||
changed_by bigint REFERENCES users(id),
|
||||
changed_at timestamptz NOT NULL DEFAULT now()
|
||||
);
|
||||
CREATE INDEX idx_audit_table_record ON audit_log(table_name, record_id);
|
||||
CREATE INDEX idx_audit_changed_at ON audit_log(changed_at DESC);
|
||||
```
|
||||
|
||||
### Enum Columns
|
||||
|
||||
```sql
|
||||
-- Option A: PostgreSQL enum type (strict, but ALTER TYPE is painful)
|
||||
CREATE TYPE order_status AS ENUM ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled');
|
||||
ALTER TABLE orders ADD COLUMN status order_status NOT NULL DEFAULT 'pending';
|
||||
|
||||
-- Option B: Text + CHECK constraint (easier to migrate)
|
||||
ALTER TABLE orders ADD COLUMN status text NOT NULL DEFAULT 'pending'
|
||||
CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled'));
|
||||
|
||||
-- Option C: Lookup table (most flexible, best for UI-driven lists)
|
||||
CREATE TABLE order_statuses (
|
||||
id serial PRIMARY KEY,
|
||||
name text UNIQUE NOT NULL,
|
||||
label text NOT NULL -- display name
|
||||
);
|
||||
```
|
||||
|
||||
**Recommendation:** Option B (text + CHECK) for most cases. Option C if statuses are managed by non-developers.
|
||||
|
||||
### Polymorphic Associations
|
||||
|
||||
```sql
|
||||
-- ❌ Anti-pattern: polymorphic FK (no referential integrity)
|
||||
CREATE TABLE comments (
|
||||
id bigserial PRIMARY KEY,
|
||||
commentable_type text, -- 'Post' or 'Photo'
|
||||
commentable_id bigint, -- no FK constraint possible!
|
||||
body text
|
||||
);
|
||||
|
||||
-- ✅ Pattern A: Separate FK columns (nullable)
|
||||
CREATE TABLE comments (
|
||||
id bigserial PRIMARY KEY,
|
||||
post_id bigint REFERENCES posts(id) ON DELETE CASCADE,
|
||||
photo_id bigint REFERENCES photos(id) ON DELETE CASCADE,
|
||||
body text NOT NULL,
|
||||
CHECK (
|
||||
(post_id IS NOT NULL AND photo_id IS NULL) OR
|
||||
(post_id IS NULL AND photo_id IS NOT NULL)
|
||||
)
|
||||
);
|
||||
|
||||
-- ✅ Pattern B: Separate tables (cleanest, best for different schemas)
|
||||
CREATE TABLE post_comments (..., post_id bigint REFERENCES posts(id));
|
||||
CREATE TABLE photo_comments (..., photo_id bigint REFERENCES photos(id));
|
||||
```
|
||||
|
||||
### JSONB Columns (Semi-Structured Data)
|
||||
|
||||
```sql
|
||||
-- Good uses: metadata, settings, flexible attributes
|
||||
CREATE TABLE products (
|
||||
id bigserial PRIMARY KEY,
|
||||
name text NOT NULL,
|
||||
price numeric(10,2) NOT NULL,
|
||||
attributes jsonb NOT NULL DEFAULT '{}' -- color, size, weight...
|
||||
);
|
||||
|
||||
-- Index for JSONB queries
|
||||
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);
|
||||
|
||||
-- Query
|
||||
SELECT * FROM products WHERE attributes->>'color' = 'red';
|
||||
SELECT * FROM products WHERE attributes @> '{"size": "XL"}';
|
||||
```
|
||||
|
||||
```
|
||||
✅ Use JSONB for truly flexible/optional data (metadata, settings, preferences)
|
||||
✅ Index JSONB columns with GIN when queried
|
||||
|
||||
❌ Never use JSONB for data that should be columns (email, status, price)
|
||||
❌ Never use JSONB to avoid schema design (it's not MongoDB-in-Postgres)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Table Partitioning (MEDIUM)
|
||||
|
||||
### When to Partition
|
||||
|
||||
```
|
||||
✅ Table > 100M rows AND growing
|
||||
✅ Most queries filter on the partition key (date range, tenant)
|
||||
✅ Old data can be dropped/archived by partition (efficient DELETE)
|
||||
|
||||
❌ Table < 10M rows (overhead not worth it)
|
||||
❌ Queries don't filter on partition key (scans all partitions)
|
||||
```
|
||||
|
||||
### Range Partitioning (Time-Series)
|
||||
|
||||
```sql
|
||||
CREATE TABLE events (
|
||||
id bigserial,
|
||||
tenant_id bigint NOT NULL,
|
||||
event_type text NOT NULL,
|
||||
payload jsonb,
|
||||
created_at timestamptz NOT NULL DEFAULT now()
|
||||
) PARTITION BY RANGE (created_at);
|
||||
|
||||
-- Monthly partitions
|
||||
CREATE TABLE events_2025_01 PARTITION OF events
|
||||
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
|
||||
CREATE TABLE events_2025_02 PARTITION OF events
|
||||
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
|
||||
|
||||
-- Automate partition creation with pg_partman or cron
|
||||
```
|
||||
|
||||
### List Partitioning (Multi-Tenant)
|
||||
|
||||
```sql
|
||||
CREATE TABLE orders (
|
||||
id bigserial,
|
||||
tenant_id bigint NOT NULL,
|
||||
total numeric(10,2)
|
||||
) PARTITION BY LIST (tenant_id);
|
||||
|
||||
CREATE TABLE orders_tenant_1 PARTITION OF orders FOR VALUES IN (1);
|
||||
CREATE TABLE orders_tenant_2 PARTITION OF orders FOR VALUES IN (2);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Anti-Patterns
|
||||
|
||||
| # | ❌ Don't | ✅ Do Instead |
|
||||
|---|---------|--------------|
|
||||
| 1 | Premature denormalization | Start 3NF, denormalize when measured |
|
||||
| 2 | Auto-increment IDs as public API identifiers | UUID for public, serial for internal |
|
||||
| 3 | No foreign key constraints | FK enforced in database, always |
|
||||
| 4 | Nullable by default | NOT NULL by default, nullable when required |
|
||||
| 5 | No indexes on FK columns | Index every FK column |
|
||||
| 6 | Single-step destructive migration | ADD → MIGRATE → REMOVE in separate deploys |
|
||||
| 7 | `CREATE INDEX` without `CONCURRENTLY` | Always `CONCURRENTLY` on live tables |
|
||||
| 8 | Polymorphic FK (`commentable_type + commentable_id`) | Separate FK columns or separate tables |
|
||||
| 9 | JSONB for everything | JSONB for flexible data only, columns for structured |
|
||||
| 10 | No `created_at` / `updated_at` | Timestamp pair on every table |
|
||||
| 11 | Comma-separated values in one column | Separate table or PostgreSQL array |
|
||||
| 12 | `text` without length validation | CHECK constraint or application validation |
|
||||
|
||||
---
|
||||
|
||||
## Common Issues
|
||||
|
||||
### Issue 1: "Query is slow but I already have an index"
|
||||
|
||||
**Symptom:** `EXPLAIN ANALYZE` shows Sequential Scan despite existing index.
|
||||
|
||||
**Causes:**
|
||||
1. **Wrong index column order** — composite index `(A, B)` won't help `WHERE B = ?`
|
||||
2. **Low selectivity** — index on boolean column (50% of rows match), planner prefers seq scan
|
||||
3. **Stale statistics** — run `ANALYZE table_name;`
|
||||
4. **Type mismatch** — comparing `varchar` column with `integer` parameter → no index use
|
||||
|
||||
**Fix:** Check `EXPLAIN (ANALYZE, BUFFERS)`, verify index matches query pattern, run `ANALYZE`.
|
||||
|
||||
### Issue 2: "Migration locks the table for minutes"
|
||||
|
||||
**Symptom:** `ALTER TABLE` blocks all writes during execution.
|
||||
|
||||
**Cause:** Adding NOT NULL constraint, changing column type, or creating index without `CONCURRENTLY`.
|
||||
|
||||
**Fix:**
|
||||
```sql
|
||||
-- Add index without lock
|
||||
CREATE INDEX CONCURRENTLY idx_name ON table(col);
|
||||
|
||||
-- Add NOT NULL constraint without lock (Postgres 12+)
|
||||
ALTER TABLE t ADD CONSTRAINT t_col_nn CHECK (col IS NOT NULL) NOT VALID;
|
||||
ALTER TABLE t VALIDATE CONSTRAINT t_col_nn; -- non-blocking validation
|
||||
```
|
||||
|
||||
### Issue 3: "How many indexes is too many?"
|
||||
|
||||
**Rule of thumb:**
|
||||
- Read-heavy table (reports, product catalog): 5-10 indexes is fine
|
||||
- Write-heavy table (events, logs): 2-3 indexes max
|
||||
- Monitor with `pg_stat_user_indexes` — drop indexes with `idx_scan = 0`
|
||||
|
||||
```sql
|
||||
-- Find unused indexes
|
||||
SELECT schemaname, relname, indexrelname, idx_scan
|
||||
FROM pg_stat_user_indexes
|
||||
WHERE idx_scan = 0 AND indexrelname NOT LIKE '%pkey%'
|
||||
ORDER BY pg_relation_size(indexrelid) DESC;
|
||||
```
|
||||
Reference in New Issue
Block a user