Initial commit: add all skills files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-10 16:52:49 +08:00
commit 6487becf60
396 changed files with 108871 additions and 0 deletions

View File

@@ -0,0 +1,444 @@
---
name: fullstack-dev-api-design
description: "API design patterns and best practices. Use when creating endpoints, choosing methods/status codes, implementing pagination, or writing OpenAPI specs. Prevents common REST/GraphQL/gRPC mistakes."
license: MIT
metadata:
version: "2.0.0"
sources:
- Microsoft REST API Guidelines
- Google API Design Guide
- Zalando RESTful API Guidelines
- JSON:API Specification
- RFC 9457 (Problem Details for HTTP APIs)
- RFC 9110 (HTTP Semantics)
---
# API Design Guidelines
Framework-agnostic API design guide for backend and full-stack engineers. 50+ rules across 10 categories, prioritized by impact. Covers REST, GraphQL, and gRPC.
## Scope
**USE this skill when:**
- Designing a new API or adding endpoints
- Reviewing API pull requests
- Choosing between REST / GraphQL / gRPC
- Writing OpenAPI specifications
- Migrating or versioning an existing API
**NOT for:**
- Framework-specific implementation details (use your framework's own skill/docs)
- Frontend data fetching patterns (use React Query / SWR docs)
- Authentication implementation details (use your auth library's docs)
- Database schema design (→ `database-schema-design`)
## Context Required
Before applying this skill, gather:
| Required | Optional |
|----------|----------|
| Target consumers (browser, mobile, service) | Existing API conventions in the project |
| Expected request volume (RPS estimate) | Current OpenAPI / Swagger spec |
| Authentication method (JWT, API key, OAuth) | Rate limiting requirements |
| Data model / domain entities | Caching strategy |
---
## Quick Start Checklist
New API endpoint? Run through this before writing code:
- [ ] Resource named as **plural noun** (`/orders`, not `/getOrders`)
- [ ] URL in **kebab-case**, body fields in **camelCase**
- [ ] Correct **HTTP method** (GET=read, POST=create, PUT=replace, PATCH=partial, DELETE=remove)
- [ ] Correct **status code** (201 Created, 422 Validation, 404 Not Found…)
- [ ] Error response follows **RFC 9457** envelope
- [ ] **Pagination** on all list endpoints (default 20, max 100)
- [ ] **Authentication** required (Bearer token, not query param)
- [ ] **Request ID** in response header (`X-Request-Id`)
- [ ] **Rate limit** headers included
- [ ] Endpoint documented in **OpenAPI spec**
---
## Quick Navigation
| Need to… | Jump to |
|----------|---------|
| Name a resource URL | [1. Resource Modeling](#1-resource-modeling-critical) |
| Pick HTTP method + status code | [3. HTTP Methods & Status Codes](#3-http-methods--status-codes-critical) |
| Format error responses | [4. Error Handling](#4-error-handling-high) |
| Add pagination or filtering | [6. Pagination & Filtering](#6-pagination--filtering-high) |
| Choose API style (REST vs GraphQL vs gRPC) | [10. API Style Decision](#10-api-style-decision-tree) |
| Version an existing API | [7. Versioning](#7-versioning-medium-high) |
| Avoid common mistakes | [Anti-Patterns](#anti-patterns-checklist) |
---
## 1. Resource Modeling (CRITICAL)
### Core Rules
```
✅ /users — plural noun
✅ /users/{id}/orders — 1 level nesting
✅ /reviews?orderId={oid} — flatten deep nesting with query params
❌ /getUsers — verb in URL
❌ /user — singular
❌ /users/{uid}/orders/{oid}/items/{iid}/reviews — 3+ levels deep
```
**Max nesting: 2 levels.** Beyond that, promote to top-level resource with filters.
### Domain Alignment
Resources map to **domain concepts**, not database tables:
```
✅ /checkout-sessions (domain aggregate)
✅ /shipping-labels (domain concept)
❌ /tbl_order_header (database table leak)
❌ /join_user_role (internal schema leak)
```
---
## 2. URL & Naming (CRITICAL)
| Context | Convention | Example |
|---------|-----------|---------|
| URL path | kebab-case | `/order-items` |
| JSON body fields | camelCase | `{ "firstName": "Jane" }` |
| Query params | camelCase or snake_case (be consistent) | `?sortBy=createdAt` |
| Headers | Train-Case | `X-Request-Id` |
**Python exception:** If your entire stack is Python/snake_case, you MAY use `snake_case` in JSON — but be **consistent across all endpoints**.
```
✅ GET /users ❌ GET /users/
✅ GET /reports/annual ❌ GET /reports/annual.json
✅ POST /users ❌ POST /users/create
```
---
## 3. HTTP Methods & Status Codes (CRITICAL)
### Method Semantics
| Method | Semantics | Idempotent | Safe | Request Body |
|--------|-----------|-----------|------|-------------|
| GET | Read | ✅ | ✅ | ❌ Never |
| POST | Create / Action | ❌ | ❌ | ✅ Always |
| PUT | Full replace | ✅ | ❌ | ✅ Always |
| PATCH | Partial update | ❌* | ❌ | ✅ Always |
| DELETE | Remove | ✅ | ❌ | ❌ Rarely |
### Status Code Quick Reference
**Success:**
| Code | When | Response Body |
|------|------|--------------|
| 200 OK | GET, PUT, PATCH success | Resource / result |
| 201 Created | POST created resource | Created resource + `Location` header |
| 202 Accepted | Async operation started | Job ID / status URL |
| 204 No Content | DELETE success, PUT with no body | None |
**Client Errors:**
| Code | When | Key Distinction |
|------|------|-----------------|
| 400 Bad Request | Malformed syntax | Can't even parse |
| 401 Unauthorized | Missing / invalid auth | "Who are you?" |
| 403 Forbidden | Authenticated, no permission | "I know you, but no" |
| 404 Not Found | Resource doesn't exist | Also use to hide 403 |
| 409 Conflict | Duplicate, version mismatch | State conflict |
| 422 Unprocessable | Valid syntax, failed validation | Semantic errors |
| 429 Too Many Requests | Rate limit hit | Include `Retry-After` |
**Server Errors:** 500 (unexpected), 502 (upstream fail), 503 (overloaded), 504 (upstream timeout)
---
## 4. Error Handling (HIGH)
### Standard Error Envelope (RFC 9457)
Every error response uses this format:
```json
{
"type": "https://api.example.com/errors/insufficient-funds",
"title": "Insufficient Funds",
"status": 422,
"detail": "Account balance $10.00 is less than withdrawal $50.00.",
"instance": "/transactions/txn_abc123",
"request_id": "req_7f3a8b2c",
"errors": [
{ "field": "amount", "message": "Exceeds balance", "code": "INSUFFICIENT_BALANCE" }
]
}
```
### Multi-Language Implementation
**TypeScript (Express):**
```typescript
class AppError extends Error {
constructor(
public readonly title: string,
public readonly status: number,
public readonly detail: string,
public readonly code: string,
) { super(detail); }
}
// Middleware
app.use((err, req, res, next) => {
if (err instanceof AppError) {
return res.status(err.status).json({
type: `https://api.example.com/errors/${err.code}`,
title: err.title, status: err.status,
detail: err.detail, request_id: req.id,
});
}
res.status(500).json({ title: 'Internal Error', status: 500, request_id: req.id });
});
```
**Python (FastAPI):**
```python
from fastapi import Request
from fastapi.responses import JSONResponse
class AppError(Exception):
def __init__(self, title: str, status: int, detail: str, code: str):
self.title, self.status, self.detail, self.code = title, status, detail, code
@app.exception_handler(AppError)
async def app_error_handler(request: Request, exc: AppError):
return JSONResponse(status_code=exc.status, content={
"type": f"https://api.example.com/errors/{exc.code}",
"title": exc.title, "status": exc.status,
"detail": exc.detail, "request_id": request.state.request_id,
})
```
### Iron Rules
```
✅ Return RFC 9457 error envelope for ALL errors
✅ Include request_id in every error response
✅ Return per-field validation errors in `errors` array
❌ Never expose stack traces in production
❌ Never return 200 for errors
❌ Never swallow errors silently
```
---
## 5. Authentication & Authorization (HIGH)
```
✅ Authorization: Bearer eyJhbGci... (header)
❌ GET /users?token=eyJhbGci... (URL — appears in logs)
✅ 401 → "Who are you?" (missing/invalid credentials)
✅ 403 → "You can't do this" (authenticated, no permission)
✅ 404 → Hide resource existence (use instead of 403 when needed)
```
**Rate Limit Headers (always include):**
```
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1625097600
Retry-After: 30
```
---
## 6. Pagination & Filtering (HIGH)
### Cursor vs Offset
| Strategy | When | Pros | Cons |
|----------|------|------|------|
| **Cursor** (preferred) | Large/dynamic datasets | Consistent, no skips | Can't jump to page N |
| **Offset** | Small/stable datasets, admin UIs | Simple, page jumps | Drift on insert/delete |
**Cursor pagination response:**
```json
{
"data": [...],
"pagination": { "next_cursor": "eyJpZCI6MTIwfQ", "has_more": true }
}
```
**Offset pagination response:**
```json
{
"data": [...],
"pagination": { "page": 3, "per_page": 20, "total": 256, "total_pages": 13 }
}
```
**Always enforce:** Default 20 items, max 100 items.
### Standard Filter Patterns
```
GET /orders?status=shipped&created_after=2025-01-01&sort=-created_at&fields=id,status
```
| Pattern | Convention |
|---------|-----------|
| Exact match | `?status=shipped` |
| Range | `?price_gte=10&price_lte=100` |
| Date range | `?created_after=2025-01-01&created_before=2025-12-31` |
| Sort | `?sort=field` (asc), `?sort=-field` (desc) |
| Sparse fields | `?fields=id,name,email` |
| Search | `?q=search+term` |
---
## 7. Versioning (MEDIUM-HIGH)
| Strategy | Format | Best For |
|----------|--------|----------|
| **URL path** (recommended) | `/v1/users` | Public APIs |
| **Header** | `Api-Version: 2` | Internal APIs |
| **Query param** | `?version=2` | Legacy (avoid) |
**Non-breaking changes (no version bump):** New optional response fields, new endpoints, new optional params.
**Breaking changes (new version required):** Removing/renaming fields, changing types, stricter validation, removing endpoints.
**Deprecation headers:**
```
Sunset: Sat, 01 Mar 2026 00:00:00 GMT
Deprecation: true
Link: <https://api.example.com/v2/users>; rel="successor-version"
```
---
## 8. Request / Response Design (MEDIUM)
### Consistent Envelope
```json
{
"data": { "id": "ord_123", "status": "pending", "total": 99.50 },
"meta": { "request_id": "req_abc123", "timestamp": "2025-06-15T10:30:00Z" }
}
```
### Key Rules
| Rule | Correct | Wrong |
|------|---------|-------|
| Timestamps | `"2025-06-15T10:30:00Z"` (ISO 8601) | `"06/15/2025"` or `1718447400` |
| Public IDs | UUID `"550e8400-..."` | Auto-increment `42` |
| Null vs absent (PATCH) | `{ "nickname": null }` = clear field | Absent field = don't change |
| HATEOAS (public APIs) | `"links": { "cancel": "/orders/123/cancel" }` | No discoverability |
---
## 9. Documentation — OpenAPI (MEDIUM)
**Design-first workflow:**
```
1. Write OpenAPI 3.1 spec
2. Review spec with stakeholders
3. Generate server stubs + client SDKs
4. Implement handlers
5. Validate responses against spec in CI
```
Every endpoint documents: summary, all parameters, request body + examples, all response codes + schemas, auth requirements.
---
## 10. API Style Decision Tree
```
What kind of API?
├─ Browser + mobile clients, flexible queries
│ └─ GraphQL
│ Rules: DataLoader (no N+1), depth limit ≤7, Relay pagination
├─ Standard CRUD, public consumers, caching important
│ └─ REST (this guide)
│ Rules: Resources, HTTP methods, status codes, OpenAPI
├─ Service-to-service, high throughput, strong typing
│ └─ gRPC
│ Rules: Protobuf schemas, streaming for large data, deadlines
├─ Full-stack TypeScript, same team owns client + server
│ └─ tRPC
│ Rules: Shared types, no code generation needed
└─ Real-time bidirectional
└─ WebSocket / SSE
Rules: Heartbeat, reconnection, message ordering
```
---
## Anti-Patterns Checklist
| # | ❌ Don't | ✅ Do Instead |
|---|---------|--------------|
| 1 | Verbs in URLs (`/getUser`) | HTTP methods + noun resources |
| 2 | Return 200 for errors | Correct 4xx/5xx status codes |
| 3 | Mix naming styles | One convention per context |
| 4 | Expose database IDs | UUIDs for public identifiers |
| 5 | No pagination on lists | Always paginate (default 20) |
| 6 | Swallow errors silently | Structured RFC 9457 errors |
| 7 | Token in URL query | Authorization header |
| 8 | Deep nesting (3+ levels) | Flatten with query params |
| 9 | Break changes without version | Maintain compatibility or version |
| 10 | No rate limiting | Implement + communicate via headers |
| 11 | No request ID | `X-Request-Id` on every response |
| 12 | Stack traces in production | Safe error message + internal log |
---
## Common Issues
### Issue 1: "Should this be a new resource or a sub-resource?"
**Symptom:** URL path keeps growing (`/users/{id}/orders/{id}/items/{id}/reviews`)
**Rule:** If the child entity makes sense on its own, promote it. If it only exists within the parent context, keep it nested (max 2 levels).
```
/reviews?orderId=123 ✅ (reviews exist independently)
/orders/{id}/items ✅ (items belong to orders, 1 level)
```
### Issue 2: "PUT or PATCH?"
**Symptom:** Team can't agree on update semantics.
**Rule:**
- PUT = client sends **complete** resource (missing fields → set to default/null)
- PATCH = client sends **only changed fields** (missing fields → unchanged)
- When unsure → **PATCH** (safer, less surprising)
### Issue 3: "400 or 422?"
**Symptom:** Inconsistent validation error codes.
**Rule:**
- 400 = can't parse request at all (malformed JSON, wrong content-type)
- 422 = parsed OK, but values fail validation (invalid email, negative quantity)

View File

@@ -0,0 +1,165 @@
# Authentication Flow Patterns
Complete auth flow across frontend and backend. Covers JWT bearer flow, automatic token refresh, Next.js server-side auth, RBAC, and backend middleware order.
---
## JWT Bearer Flow (Most Common)
```
1. Login
Client → POST /api/auth/login { email, password }
Server → { accessToken (15min), refreshToken (7d, httpOnly cookie) }
2. Authenticated Requests
Client → GET /api/orders Authorization: Bearer <accessToken>
Server → validates JWT → returns data
3. Token Refresh (transparent)
Client → 401 received → POST /api/auth/refresh (cookie auto-sent)
Server → new accessToken
Client → retry original request with new token
4. Logout
Client → POST /api/auth/logout
Server → invalidate refresh token → clear cookie
```
---
## Frontend: Automatic Token Refresh
```typescript
// lib/api-client.ts — add to existing fetch wrapper
async function apiWithRefresh<T>(path: string, options: RequestInit = {}): Promise<T> {
try {
return await api<T>(path, options);
} catch (err) {
if (err instanceof ApiError && err.status === 401) {
// Try refresh
const refreshed = await api<{ accessToken: string }>('/api/auth/refresh', {
method: 'POST',
credentials: 'include', // send httpOnly cookie
});
setAuthToken(refreshed.accessToken);
// Retry original request
return api<T>(path, options);
}
throw err;
}
}
```
---
## Next.js: Server-Side Auth (App Router)
```typescript
// middleware.ts — protect routes server-side
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';
export function middleware(request: NextRequest) {
const token = request.cookies.get('session')?.value;
if (!token && request.nextUrl.pathname.startsWith('/dashboard')) {
return NextResponse.redirect(new URL('/login', request.url));
}
return NextResponse.next();
}
// app/dashboard/page.tsx — server component with auth
import { cookies } from 'next/headers';
export default async function Dashboard() {
const token = (await cookies()).get('session')?.value;
const user = await fetch(`${process.env.API_URL}/api/me`, {
headers: { Authorization: `Bearer ${token}` },
}).then(r => r.json());
return <DashboardContent user={user} />;
}
```
---
## Backend: Standard Middleware Order
```
Request → 1.RequestID → 2.Logging → 3.CORS → 4.RateLimit → 5.BodyParse
→ 6.Auth → 7.Authz → 8.Validation → 9.Handler → 10.ErrorHandler → Response
```
---
## Backend: JWT Rules
```
✅ Short expiry access token (15min) + refresh token (server-stored)
✅ Minimal claims: userId, roles (not entire user object)
✅ Rotate signing keys periodically
❌ Never store tokens in localStorage (XSS risk)
❌ Never pass tokens in URL query params
```
---
## Backend: RBAC Pattern
```typescript
function authorize(...roles: Role[]) {
return (req, res, next) => {
if (!req.user) throw new UnauthorizedError();
if (!roles.some(r => req.user.roles.includes(r))) throw new ForbiddenError();
next();
};
}
router.delete('/users/:id', authenticate, authorize('admin'), deleteUser);
```
---
## Auth Decision Table
| Method | When | Frontend |
|--------|------|----------|
| Session | Same-domain, SSR, Django templates | Django templates / htmx |
| JWT | Different domain, SPA, mobile | React, Vue, mobile apps |
| OAuth2 | Third-party login, API consumers | Any |
---
## Iron Rules
```
✅ Access token: short-lived (15min), in memory
✅ Refresh token: httpOnly cookie (XSS-safe)
✅ Automatic transparent refresh on 401
✅ Redirect to login when refresh fails
❌ Never store tokens in localStorage (XSS risk)
❌ Never send tokens in URL query params (logged)
❌ Never trust client-side auth checks alone (server must validate)
```
---
## Common Issues
### Issue 1: "Auth works on page load but breaks on navigation"
**Cause:** Token stored in component state (lost on unmount).
**Fix:** Store access token in a persistent location:
- React Context (survives navigation, lost on refresh)
- Cookie (survives refresh)
- React Query cache with `staleTime: Infinity` for session
### Issue 2: "CORS error with auth requests"
**Cause:** Missing `credentials: 'include'` on frontend or `credentials: true` on backend CORS config.
**Fix:**
1. Frontend: `fetch(url, { credentials: 'include' })`
2. Backend: `cors({ origin: 'https://your-frontend.com', credentials: true })`
3. Backend: explicit origin (not `*`) when using credentials

View File

@@ -0,0 +1,706 @@
---
name: fullstack-dev-db-schema
description: "Database schema design and migrations. Use when creating tables, defining ORM models, adding indexes, or designing relationships. Covers zero-downtime migrations and multi-tenancy."
license: MIT
metadata:
version: "1.0.0"
sources:
- PostgreSQL official documentation
- Use The Index, Luke (use-the-index-luke.com)
- Designing Data-Intensive Applications (Martin Kleppmann)
- Database Reliability Engineering (Laine Campbell & Charity Majors)
---
# Database Schema Design
ORM-agnostic guide for relational database schema design. Covers data modeling, normalization, indexing, migrations, multi-tenancy, and common application patterns. Primarily PostgreSQL-focused but principles apply to MySQL/MariaDB.
## Scope
**USE this skill when:**
- Designing a schema for a new project or feature
- Deciding between normalization and denormalization
- Choosing which indexes to create
- Planning a zero-downtime migration on a live database
- Implementing multi-tenant data isolation
- Adding audit trails, soft delete, or versioning
- Diagnosing slow queries caused by schema problems
**NOT for:**
- Choosing which database technology to use (→ `technology-selection`)
- PostgreSQL-specific query tuning (use PostgreSQL performance docs)
- ORM-specific configuration (→ `django-best-practices` or your ORM's docs)
- Application-layer caching (→ `fullstack-dev-practices`)
## Context Required
| Required | Optional |
|----------|----------|
| Database engine (PostgreSQL / MySQL) | Expected data volume (rows, growth rate) |
| Domain entities and relationships | Read/write ratio |
| Key access patterns (queries) | Multi-tenant requirements |
---
## Quick Start Checklist
Designing a new schema:
- [ ] **Domain entities identified** — map 1 entity = 1 table (not 1 class = 1 table)
- [ ] **Primary keys**: UUID for public IDs, serial/bigserial for internal-only
- [ ] **Foreign keys** with explicit `ON DELETE` behavior
- [ ] **NOT NULL** by default — nullable only when business logic requires it
- [ ] **Timestamps**: `created_at` + `updated_at` on every table
- [ ] **Indexes** created for every WHERE, JOIN, ORDER BY column
- [ ] **No premature denormalization** — start normalized, denormalize when measured
- [ ] **Naming convention** consistent: `snake_case`, plural table names
---
## Quick Navigation
| Need to… | Jump to |
|----------|---------|
| Model entities and relationships | [1. Data Modeling](#1-data-modeling-critical) |
| Decide normalize vs denormalize | [2. Normalization](#2-normalization-vs-denormalization-critical) |
| Choose the right index | [3. Indexing](#3-indexing-strategy-critical) |
| Run migrations safely on live DB | [4. Migrations](#4-zero-downtime-migrations-high) |
| Design multi-tenant schema | [5. Multi-Tenancy](#5-multi-tenant-design-high) |
| Add soft delete / audit trails | [6. Common Patterns](#6-common-schema-patterns-medium) |
| Partition large tables | [7. Partitioning](#7-table-partitioning-medium) |
| See anti-patterns | [Anti-Patterns](#anti-patterns) |
---
## Core Principles (7 Rules)
```
1. ✅ Start normalized (3NF) — denormalize only when you have measured evidence
2. ✅ Every table has a primary key, created_at, updated_at
3. ✅ UUID for public-facing IDs, serial for internal join keys
4. ✅ NOT NULL by default — null is a business decision, not a lazy default
5. ✅ Index every column used in WHERE, JOIN, ORDER BY
6. ✅ Foreign keys enforced in database (not just application code)
7. ✅ Migrations are additive — never drop/rename in production without a multi-step plan
```
---
## 1. Data Modeling (CRITICAL)
### Table Naming
```sql
-- ✅ Plural, snake_case
CREATE TABLE orders (...);
CREATE TABLE order_items (...);
CREATE TABLE user_profiles (...);
-- ❌ Singular, mixed case
CREATE TABLE Order (...);
CREATE TABLE OrderItem (...);
CREATE TABLE tbl_usr_prof (...); -- cryptic abbreviation
```
### Primary Keys
| Strategy | When | Pros | Cons |
|----------|------|------|------|
| `bigserial` (auto-increment) | Internal tables, FK joins | Compact, fast joins | Enumerable, not safe for public IDs |
| `uuid` (v4 random) | Public-facing resources | Non-guessable, globally unique | Larger (16 bytes), random I/O on B-Tree |
| `uuid` v7 (time-sorted) | Public + needs ordering | Non-guessable + insert-friendly | Newer, less ecosystem support |
| `text` slug | URL-friendly resources | Human-readable | Must enforce uniqueness, updates expensive |
**Recommended default:**
```sql
CREATE TABLE orders (
id bigserial PRIMARY KEY, -- internal FK target
public_id uuid NOT NULL DEFAULT gen_random_uuid() UNIQUE, -- API-facing
-- ...
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
```
### Relationships
```sql
-- One-to-Many: user → orders
CREATE TABLE orders (
id bigserial PRIMARY KEY,
user_id bigint NOT NULL REFERENCES users(id) ON DELETE CASCADE,
-- ...
);
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Many-to-Many: orders ↔ products (via junction table)
CREATE TABLE order_items (
id bigserial PRIMARY KEY,
order_id bigint NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id bigint NOT NULL REFERENCES products(id) ON DELETE RESTRICT,
quantity int NOT NULL CHECK (quantity > 0),
unit_price numeric(10,2) NOT NULL,
UNIQUE (order_id, product_id) -- prevent duplicate line items
);
-- One-to-One: user → profile
CREATE TABLE user_profiles (
user_id bigint PRIMARY KEY REFERENCES users(id) ON DELETE CASCADE,
bio text,
avatar_url text,
-- ...
);
```
### ON DELETE Behavior
| Behavior | When | Example |
|----------|------|---------|
| `CASCADE` | Child meaningless without parent | order_items when order deleted |
| `RESTRICT` | Prevent accidental deletion | products referenced by order_items |
| `SET NULL` | Preserve child, clear reference | orders.assigned_to when employee leaves |
| `SET DEFAULT` | Fallback to default value | Rare, for status columns |
---
## 2. Normalization vs Denormalization (CRITICAL)
### Start Normalized (3NF)
**Normal forms in practice:**
| Form | Rule | Example Violation |
|------|------|-------------------|
| 1NF | No repeating groups, atomic values | `tags = "go,python,rust"` in one column |
| 2NF | No partial dependencies (composite keys) | `order_items.product_name` depends on `product_id` alone |
| 3NF | No transitive dependencies | `orders.customer_city` depends on `customer_id`, not `order_id` |
**1NF violation fix:**
```sql
-- ❌ Tags as comma-separated string
CREATE TABLE posts (id serial, tags text); -- tags = "go,python"
-- ✅ Separate table (or array/JSONB if simple)
CREATE TABLE post_tags (
post_id bigint REFERENCES posts(id) ON DELETE CASCADE,
tag_id bigint REFERENCES tags(id) ON DELETE CASCADE,
PRIMARY KEY (post_id, tag_id)
);
-- ✅ Alternative: PostgreSQL array (if tags are just strings, no metadata)
CREATE TABLE posts (id serial, tags text[] NOT NULL DEFAULT '{}');
CREATE INDEX idx_posts_tags ON posts USING GIN(tags);
```
### When to Denormalize
**Denormalize ONLY when:**
1. You have **measured** a performance problem (EXPLAIN ANALYZE, not "I think it's slow")
2. The denormalized data is **read-heavy** (read:write ratio > 100:1)
3. You accept the **consistency maintenance cost** (triggers, application logic, or materialized views)
**Safe denormalization patterns:**
```sql
-- Pattern 1: Materialized view (computed, refreshable)
CREATE MATERIALIZED VIEW order_summary AS
SELECT o.id, o.user_id, o.total,
COUNT(oi.id) AS item_count,
u.email AS user_email
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN users u ON u.id = o.user_id
GROUP BY o.id, u.email;
REFRESH MATERIALIZED VIEW CONCURRENTLY order_summary; -- non-blocking
-- Pattern 2: Cached aggregate column (application-maintained)
ALTER TABLE orders ADD COLUMN item_count int NOT NULL DEFAULT 0;
-- Update via trigger or application code on order_item insert/delete
-- Pattern 3: JSONB snapshot (freeze-at-write-time)
-- Store a copy of the product details at the time of purchase
CREATE TABLE order_items (
id bigserial PRIMARY KEY,
order_id bigint NOT NULL REFERENCES orders(id),
product_id bigint REFERENCES products(id),
quantity int NOT NULL,
unit_price numeric(10,2) NOT NULL, -- frozen price
product_snapshot jsonb NOT NULL -- frozen name, description, image
);
```
---
## 3. Indexing Strategy (CRITICAL)
### Index Types (PostgreSQL)
| Type | When | Example |
|------|------|---------|
| **B-Tree** (default) | Equality, range, ORDER BY | `WHERE status = 'active'`, `WHERE created_at > '2025-01-01'` |
| **Hash** | Equality only (rare, B-Tree usually better) | `WHERE id = 123` (large tables, Postgres 10+) |
| **GIN** | Arrays, JSONB, full-text search | `WHERE tags @> '{go}'`, `WHERE data->>'key' = 'val'` |
| **GiST** | Geometry, ranges, nearest-neighbor | PostGIS, tsrange, ltree |
| **BRIN** | Very large tables with natural ordering | Time-series data sorted by timestamp |
### Index Decision Rules
```
Rule 1: Index every column in WHERE clauses
Rule 2: Index every column used in JOIN ON conditions
Rule 3: Index every column in ORDER BY (if queried with LIMIT)
Rule 4: Composite index for multi-column WHERE (leftmost prefix rule)
Rule 5: Partial index when filtering a subset (e.g., only active records)
Rule 6: Covering index (INCLUDE) to avoid table lookup
Rule 7: DON'T index low-cardinality columns alone (e.g., boolean)
```
### Composite Index: Column Order Matters
```sql
-- Query: WHERE user_id = ? AND status = ? ORDER BY created_at DESC
-- ✅ Optimal: matches query pattern left-to-right
CREATE INDEX idx_orders_user_status_created
ON orders(user_id, status, created_at DESC);
-- ❌ Wrong order: can't use for this query efficiently
CREATE INDEX idx_orders_created_user_status
ON orders(created_at DESC, user_id, status);
```
**Leftmost prefix rule:** Index on `(A, B, C)` supports queries on `(A)`, `(A, B)`, `(A, B, C)` but NOT `(B)`, `(C)`, or `(B, C)`.
### Partial Index (Index Only What Matters)
```sql
-- Only 5% of orders are 'pending', but queried frequently
CREATE INDEX idx_orders_pending
ON orders(created_at DESC)
WHERE status = 'pending';
-- Only active users matter for login
CREATE INDEX idx_users_active_email
ON users(email)
WHERE is_active = true;
```
### Covering Index (Avoid Table Lookup)
```sql
-- Query only needs id and status, no need to read the table row
CREATE INDEX idx_orders_user_covering
ON orders(user_id) INCLUDE (status, total);
-- Now this query is index-only:
SELECT status, total FROM orders WHERE user_id = 123;
```
### When NOT to Index
```
❌ Columns rarely used in WHERE/JOIN/ORDER BY
❌ Tables with < 1,000 rows (sequential scan is faster)
❌ Columns with very low cardinality alone (e.g., boolean is_active)
❌ Write-heavy tables where index maintenance cost > read benefit
❌ Duplicate indexes (check pg_stat_user_indexes for unused indexes)
```
---
## 4. Zero-Downtime Migrations (HIGH)
### The Golden Rule
```
NEVER make destructive changes in one step.
Always: ADD → MIGRATE DATA → REMOVE OLD (in separate deploys).
```
### Safe Migration Patterns
**Rename a column (3 deploys):**
```
Deploy 1: Add new column
ALTER TABLE users ADD COLUMN full_name text;
UPDATE users SET full_name = name; -- backfill
-- App writes to BOTH name and full_name
Deploy 2: Switch reads to new column
-- App reads from full_name, still writes to both
Deploy 3: Drop old column
ALTER TABLE users DROP COLUMN name;
-- App only uses full_name
```
**Add a NOT NULL column (2 deploys):**
```sql
-- Deploy 1: Add nullable column, backfill
ALTER TABLE orders ADD COLUMN currency text; -- nullable first
UPDATE orders SET currency = 'USD' WHERE currency IS NULL; -- backfill
-- Deploy 2: Add constraint (after all rows backfilled)
ALTER TABLE orders ALTER COLUMN currency SET NOT NULL;
ALTER TABLE orders ALTER COLUMN currency SET DEFAULT 'USD';
```
**Add an index without locking:**
```sql
-- ✅ CONCURRENTLY: no table lock, can run on live DB
CREATE INDEX CONCURRENTLY idx_orders_status ON orders(status);
-- ❌ Without CONCURRENTLY: locks table for writes during build
CREATE INDEX idx_orders_status ON orders(status);
```
### Migration Safety Checklist
```
✅ Migration runs in < 30 seconds on production data size
✅ No exclusive table locks (use CONCURRENTLY for indexes)
✅ Rollback plan documented and tested
✅ Backfill runs in batches (not one giant UPDATE)
✅ New column added as nullable first, constraint added later
✅ Old column kept until all code references removed
❌ Never rename/drop columns in one deploy
❌ Never ALTER TYPE on large tables without testing timing
❌ Never run data backfill in a transaction (OOM on large tables)
```
### Batch Backfill Template
```sql
-- Backfill in batches of 10,000 (avoids long-running transactions)
DO $$
DECLARE
batch_size int := 10000;
affected int;
BEGIN
LOOP
UPDATE orders
SET currency = 'USD'
WHERE id IN (
SELECT id FROM orders WHERE currency IS NULL LIMIT batch_size
);
GET DIAGNOSTICS affected = ROW_COUNT;
RAISE NOTICE 'Updated % rows', affected;
EXIT WHEN affected = 0;
PERFORM pg_sleep(0.1); -- brief pause to reduce load
END LOOP;
END $$;
```
---
## 5. Multi-Tenant Design (HIGH)
### Three Approaches
| Approach | Isolation | Complexity | When |
|----------|-----------|------------|------|
| **Row-level** (shared tables + `tenant_id`) | Low | Low | SaaS MVP, < 1,000 tenants |
| **Schema-per-tenant** | Medium | Medium | Regulated industries, moderate scale |
| **Database-per-tenant** | High | High | Enterprise, strict data isolation |
### Row-Level Tenancy (Most Common)
```sql
-- Every table has tenant_id
CREATE TABLE orders (
id bigserial PRIMARY KEY,
tenant_id bigint NOT NULL REFERENCES tenants(id),
user_id bigint NOT NULL REFERENCES users(id),
total numeric(10,2) NOT NULL,
-- ...
);
-- Composite index: tenant first (most queries filter by tenant)
CREATE INDEX idx_orders_tenant_user ON orders(tenant_id, user_id);
CREATE INDEX idx_orders_tenant_status ON orders(tenant_id, status);
-- Row-Level Security (PostgreSQL)
ALTER TABLE orders ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON orders
USING (tenant_id = current_setting('app.tenant_id')::bigint);
```
**Application-level enforcement:**
```typescript
// Middleware: set tenant context on every request
app.use((req, res, next) => {
const tenantId = req.headers['x-tenant-id'];
if (!tenantId) return res.status(400).json({ error: 'Missing tenant' });
req.tenantId = tenantId;
next();
});
// Repository: ALWAYS filter by tenant
async findOrders(tenantId: string, userId: string) {
return db.order.findMany({
where: { tenantId, userId }, // ← tenant_id in EVERY query
});
}
```
### Rules
```
✅ tenant_id in EVERY table that holds tenant data
✅ tenant_id as FIRST column in every composite index
✅ Application middleware enforces tenant context
✅ Use RLS (PostgreSQL) as defense-in-depth, not sole protection
✅ Test with 2+ tenants to verify isolation
❌ Never allow cross-tenant queries in application code
❌ Never skip tenant_id in WHERE clauses (even in admin tools)
```
---
## 6. Common Schema Patterns (MEDIUM)
### Soft Delete
```sql
ALTER TABLE orders ADD COLUMN deleted_at timestamptz;
-- All queries filter deleted records
CREATE VIEW active_orders AS
SELECT * FROM orders WHERE deleted_at IS NULL;
-- Partial index: only index non-deleted rows
CREATE INDEX idx_orders_active_status
ON orders(status, created_at DESC)
WHERE deleted_at IS NULL;
```
**ORM integration:**
```typescript
// Prisma middleware: auto-filter soft-deleted records
prisma.$use(async (params, next) => {
if (params.action === 'findMany' || params.action === 'findFirst') {
params.args.where = { ...params.args.where, deletedAt: null };
}
return next(params);
});
```
### Audit Trail
```sql
-- Option A: Audit columns on every table
ALTER TABLE orders ADD COLUMN created_by bigint REFERENCES users(id);
ALTER TABLE orders ADD COLUMN updated_by bigint REFERENCES users(id);
-- Option B: Separate audit log table (more detail)
CREATE TABLE audit_log (
id bigserial PRIMARY KEY,
table_name text NOT NULL,
record_id bigint NOT NULL,
action text NOT NULL CHECK (action IN ('INSERT', 'UPDATE', 'DELETE')),
old_data jsonb,
new_data jsonb,
changed_by bigint REFERENCES users(id),
changed_at timestamptz NOT NULL DEFAULT now()
);
CREATE INDEX idx_audit_table_record ON audit_log(table_name, record_id);
CREATE INDEX idx_audit_changed_at ON audit_log(changed_at DESC);
```
### Enum Columns
```sql
-- Option A: PostgreSQL enum type (strict, but ALTER TYPE is painful)
CREATE TYPE order_status AS ENUM ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled');
ALTER TABLE orders ADD COLUMN status order_status NOT NULL DEFAULT 'pending';
-- Option B: Text + CHECK constraint (easier to migrate)
ALTER TABLE orders ADD COLUMN status text NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'confirmed', 'shipped', 'delivered', 'cancelled'));
-- Option C: Lookup table (most flexible, best for UI-driven lists)
CREATE TABLE order_statuses (
id serial PRIMARY KEY,
name text UNIQUE NOT NULL,
label text NOT NULL -- display name
);
```
**Recommendation:** Option B (text + CHECK) for most cases. Option C if statuses are managed by non-developers.
### Polymorphic Associations
```sql
-- ❌ Anti-pattern: polymorphic FK (no referential integrity)
CREATE TABLE comments (
id bigserial PRIMARY KEY,
commentable_type text, -- 'Post' or 'Photo'
commentable_id bigint, -- no FK constraint possible!
body text
);
-- ✅ Pattern A: Separate FK columns (nullable)
CREATE TABLE comments (
id bigserial PRIMARY KEY,
post_id bigint REFERENCES posts(id) ON DELETE CASCADE,
photo_id bigint REFERENCES photos(id) ON DELETE CASCADE,
body text NOT NULL,
CHECK (
(post_id IS NOT NULL AND photo_id IS NULL) OR
(post_id IS NULL AND photo_id IS NOT NULL)
)
);
-- ✅ Pattern B: Separate tables (cleanest, best for different schemas)
CREATE TABLE post_comments (..., post_id bigint REFERENCES posts(id));
CREATE TABLE photo_comments (..., photo_id bigint REFERENCES photos(id));
```
### JSONB Columns (Semi-Structured Data)
```sql
-- Good uses: metadata, settings, flexible attributes
CREATE TABLE products (
id bigserial PRIMARY KEY,
name text NOT NULL,
price numeric(10,2) NOT NULL,
attributes jsonb NOT NULL DEFAULT '{}' -- color, size, weight...
);
-- Index for JSONB queries
CREATE INDEX idx_products_attrs ON products USING GIN(attributes);
-- Query
SELECT * FROM products WHERE attributes->>'color' = 'red';
SELECT * FROM products WHERE attributes @> '{"size": "XL"}';
```
```
✅ Use JSONB for truly flexible/optional data (metadata, settings, preferences)
✅ Index JSONB columns with GIN when queried
❌ Never use JSONB for data that should be columns (email, status, price)
❌ Never use JSONB to avoid schema design (it's not MongoDB-in-Postgres)
```
---
## 7. Table Partitioning (MEDIUM)
### When to Partition
```
✅ Table > 100M rows AND growing
✅ Most queries filter on the partition key (date range, tenant)
✅ Old data can be dropped/archived by partition (efficient DELETE)
❌ Table < 10M rows (overhead not worth it)
❌ Queries don't filter on partition key (scans all partitions)
```
### Range Partitioning (Time-Series)
```sql
CREATE TABLE events (
id bigserial,
tenant_id bigint NOT NULL,
event_type text NOT NULL,
payload jsonb,
created_at timestamptz NOT NULL DEFAULT now()
) PARTITION BY RANGE (created_at);
-- Monthly partitions
CREATE TABLE events_2025_01 PARTITION OF events
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
CREATE TABLE events_2025_02 PARTITION OF events
FOR VALUES FROM ('2025-02-01') TO ('2025-03-01');
-- Automate partition creation with pg_partman or cron
```
### List Partitioning (Multi-Tenant)
```sql
CREATE TABLE orders (
id bigserial,
tenant_id bigint NOT NULL,
total numeric(10,2)
) PARTITION BY LIST (tenant_id);
CREATE TABLE orders_tenant_1 PARTITION OF orders FOR VALUES IN (1);
CREATE TABLE orders_tenant_2 PARTITION OF orders FOR VALUES IN (2);
```
---
## Anti-Patterns
| # | ❌ Don't | ✅ Do Instead |
|---|---------|--------------|
| 1 | Premature denormalization | Start 3NF, denormalize when measured |
| 2 | Auto-increment IDs as public API identifiers | UUID for public, serial for internal |
| 3 | No foreign key constraints | FK enforced in database, always |
| 4 | Nullable by default | NOT NULL by default, nullable when required |
| 5 | No indexes on FK columns | Index every FK column |
| 6 | Single-step destructive migration | ADD → MIGRATE → REMOVE in separate deploys |
| 7 | `CREATE INDEX` without `CONCURRENTLY` | Always `CONCURRENTLY` on live tables |
| 8 | Polymorphic FK (`commentable_type + commentable_id`) | Separate FK columns or separate tables |
| 9 | JSONB for everything | JSONB for flexible data only, columns for structured |
| 10 | No `created_at` / `updated_at` | Timestamp pair on every table |
| 11 | Comma-separated values in one column | Separate table or PostgreSQL array |
| 12 | `text` without length validation | CHECK constraint or application validation |
---
## Common Issues
### Issue 1: "Query is slow but I already have an index"
**Symptom:** `EXPLAIN ANALYZE` shows Sequential Scan despite existing index.
**Causes:**
1. **Wrong index column order** — composite index `(A, B)` won't help `WHERE B = ?`
2. **Low selectivity** — index on boolean column (50% of rows match), planner prefers seq scan
3. **Stale statistics** — run `ANALYZE table_name;`
4. **Type mismatch** — comparing `varchar` column with `integer` parameter → no index use
**Fix:** Check `EXPLAIN (ANALYZE, BUFFERS)`, verify index matches query pattern, run `ANALYZE`.
### Issue 2: "Migration locks the table for minutes"
**Symptom:** `ALTER TABLE` blocks all writes during execution.
**Cause:** Adding NOT NULL constraint, changing column type, or creating index without `CONCURRENTLY`.
**Fix:**
```sql
-- Add index without lock
CREATE INDEX CONCURRENTLY idx_name ON table(col);
-- Add NOT NULL constraint without lock (Postgres 12+)
ALTER TABLE t ADD CONSTRAINT t_col_nn CHECK (col IS NOT NULL) NOT VALID;
ALTER TABLE t VALIDATE CONSTRAINT t_col_nn; -- non-blocking validation
```
### Issue 3: "How many indexes is too many?"
**Rule of thumb:**
- Read-heavy table (reports, product catalog): 5-10 indexes is fine
- Write-heavy table (events, logs): 2-3 indexes max
- Monitor with `pg_stat_user_indexes` — drop indexes with `idx_scan = 0`
```sql
-- Find unused indexes
SELECT schemaname, relname, indexrelname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0 AND indexrelname NOT LIKE '%pkey%'
ORDER BY pg_relation_size(indexrelid) DESC;
```

View File

@@ -0,0 +1,466 @@
# Django Best Practices
Production-grade guide for Django 5.x and Django REST Framework. 40+ rules across 8 categories.
## Core Principles (7 Rules)
```
1. ✅ Custom User model BEFORE first migration (can't change later)
2. ✅ One Django app per domain concept (users, orders, payments)
3. ✅ Fat models, thin views — business logic in models/managers, not views
4. ✅ Always use select_related/prefetch_related (prevent N+1)
5. ✅ Settings split by environment (base + dev + prod)
6. ✅ Test with pytest-django + factory_boy (not fixtures)
7. ✅ Never use runserver in production (Gunicorn + Nginx)
```
---
## 1. Project Structure (CRITICAL)
### App-Per-Domain
```
myproject/
├── config/ # Project config
│ ├── __init__.py
│ ├── settings/
│ │ ├── base.py # Shared settings
│ │ ├── dev.py # DEBUG=True, SQLite ok
│ │ └── prod.py # DEBUG=False, Postgres, HTTPS
│ ├── urls.py
│ ├── wsgi.py
│ └── asgi.py
├── apps/
│ ├── users/ # Custom User model
│ │ ├── models.py
│ │ ├── serializers.py
│ │ ├── views.py
│ │ ├── urls.py
│ │ ├── admin.py
│ │ ├── services.py # Business logic
│ │ ├── selectors.py # Complex queries
│ │ └── tests/
│ │ ├── test_models.py
│ │ ├── test_views.py
│ │ └── factories.py
│ ├── orders/
│ └── payments/
├── manage.py
├── requirements/
│ ├── base.txt
│ ├── dev.txt
│ └── prod.txt
└── docker-compose.yml
```
### Rules
```
✅ One app = one bounded context (users, orders, payments)
✅ Business logic in services.py / selectors.py, not views
✅ Each app has its own urls.py, admin.py, tests/
❌ Never put everything in one app
❌ Never import across app boundaries at the model level (use IDs)
❌ Never put business logic in views or serializers
```
---
## 2. Models & Migrations (CRITICAL)
### Custom User Model (Day 1!)
```python
# apps/users/models.py
from django.contrib.auth.models import AbstractUser
from django.db import models
import uuid
class User(AbstractUser):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
email = models.EmailField(unique=True)
USERNAME_FIELD = 'email'
REQUIRED_FIELDS = ['username']
class Meta:
db_table = 'users'
# config/settings/base.py
AUTH_USER_MODEL = 'users.User'
```
**This MUST be done before `migrate`. Cannot change after.**
### Model Best Practices
```python
class TimeStampedModel(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class Meta:
abstract = True
class Order(TimeStampedModel):
id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE, related_name='orders')
status = models.CharField(max_length=20, choices=OrderStatus.choices, default=OrderStatus.PENDING, db_index=True)
total = models.DecimalField(max_digits=10, decimal_places=2)
class Meta:
db_table = 'orders'
ordering = ['-created_at']
indexes = [
models.Index(fields=['user', 'status']),
]
def can_cancel(self) -> bool:
return self.status in [OrderStatus.PENDING, OrderStatus.CONFIRMED]
def cancel(self):
if not self.can_cancel():
raise ValueError(f"Cannot cancel order in {self.status} status")
self.status = OrderStatus.CANCELLED
self.save(update_fields=['status', 'updated_at'])
```
### Migration Rules
```
✅ Review migration SQL: python manage.py sqlmigrate app_name 0001
✅ Name migrations descriptively: --name add_status_index_to_orders
✅ Separate data migrations from schema migrations
✅ Non-destructive first: add column → backfill → remove old column
❌ Never edit or delete applied migrations
❌ Never use RunPython without reverse function
```
---
## 3. Views & Serializers — DRF (HIGH)
### Service Layer Pattern
```python
# apps/orders/services.py
from django.db import transaction
class OrderService:
@staticmethod
@transaction.atomic
def create_order(user, items_data: list[dict]) -> Order:
total = sum(item['price'] * item['quantity'] for item in items_data)
order = Order.objects.create(user=user, total=total)
OrderItem.objects.bulk_create([
OrderItem(order=order, **item) for item in items_data
])
return order
@staticmethod
def cancel_order(order_id: str, user) -> Order:
order = Order.objects.select_for_update().get(id=order_id, user=user)
order.cancel()
return order
```
### Serializers
```python
class OrderSerializer(serializers.ModelSerializer):
items = OrderItemSerializer(many=True, read_only=True)
class Meta:
model = Order
fields = ['id', 'status', 'total', 'items', 'created_at']
read_only_fields = ['id', 'total', 'created_at']
class CreateOrderSerializer(serializers.Serializer):
"""Input-only serializer — separate from output."""
items = serializers.ListField(
child=serializers.DictField(), min_length=1, max_length=50,
)
def validate_items(self, items):
for item in items:
if item.get('quantity', 0) < 1:
raise serializers.ValidationError("Quantity must be at least 1")
return items
```
### Views (Thin!)
```python
@api_view(['POST'])
@permission_classes([IsAuthenticated])
def create_order(request):
serializer = CreateOrderSerializer(data=request.data)
serializer.is_valid(raise_exception=True)
order = OrderService.create_order(request.user, serializer.validated_data['items'])
return Response({'data': OrderSerializer(order).data}, status=status.HTTP_201_CREATED)
```
### Rules
```
✅ Separate input serializers from output serializers
✅ Views only: validate → call service → serialize → respond
✅ Use @transaction.atomic for multi-model writes
❌ Never put business logic in views or serializers
❌ Never use ModelSerializer for write operations (too implicit)
```
---
## 4. Authentication (HIGH)
| Method | When | Frontend |
|--------|------|----------|
| Session | Same-domain, SSR, Django templates | Django templates / htmx |
| JWT | Different domain, SPA, mobile | React, Vue, mobile apps |
| OAuth2 | Third-party login, API consumers | Any |
### JWT Config (djangorestframework-simplejwt)
```python
SIMPLE_JWT = {
'ACCESS_TOKEN_LIFETIME': timedelta(minutes=15),
'REFRESH_TOKEN_LIFETIME': timedelta(days=7),
'ROTATE_REFRESH_TOKENS': True,
'BLACKLIST_AFTER_ROTATION': True,
}
```
---
## 5. Performance Optimization (HIGH)
### N+1 Query Prevention
```python
# ❌ N+1: 1 query for orders + N queries for users
orders = Order.objects.all()
for o in orders:
print(o.user.email) # hits DB each iteration
# ✅ select_related (FK/OneToOne — JOIN)
orders = Order.objects.select_related('user').all()
# ✅ prefetch_related (ManyToMany/reverse FK — 2 queries)
orders = Order.objects.prefetch_related('items').all()
# ✅ Combined
orders = Order.objects.select_related('user').prefetch_related('items').all()
```
### Query Optimization Toolkit
```python
# Only fetch needed columns
User.objects.values('id', 'email')
User.objects.values_list('email', flat=True)
# Annotate instead of Python loops
from django.db.models import Count, Sum
Order.objects.annotate(item_count=Count('items'), revenue=Sum('items__price'))
# Bulk operations
OrderItem.objects.bulk_create([...])
Order.objects.filter(status='pending').update(status='cancelled')
# Database indexes
class Meta:
indexes = [
models.Index(fields=['user', 'status']),
models.Index(fields=['-created_at']),
models.Index(fields=['email'], condition=Q(is_active=True)),
]
# Pagination
from rest_framework.pagination import CursorPagination
class OrderPagination(CursorPagination):
page_size = 20
ordering = '-created_at'
```
### Caching
```python
from django.core.cache import cache
def get_product(product_id: str):
cache_key = f'product:{product_id}'
product = cache.get(cache_key)
if product is None:
product = Product.objects.get(id=product_id)
cache.set(cache_key, product, timeout=300)
return product
```
---
## 6. Testing (MEDIUM-HIGH)
### pytest-django + factory_boy
```python
# conftest.py
@pytest.fixture
def api_client():
return APIClient()
@pytest.fixture
def authenticated_client(api_client, user_factory):
user = user_factory()
api_client.force_authenticate(user=user)
return api_client
```
```python
# factories.py
class UserFactory(factory.django.DjangoModelFactory):
class Meta:
model = User
email = factory.Sequence(lambda n: f'user{n}@example.com')
username = factory.Sequence(lambda n: f'user{n}')
class OrderFactory(factory.django.DjangoModelFactory):
class Meta:
model = 'orders.Order'
user = factory.SubFactory(UserFactory)
total = factory.Faker('pydecimal', left_digits=3, right_digits=2, positive=True)
```
```python
# test_views.py
@pytest.mark.django_db
class TestListOrders:
def test_returns_user_orders(self, authenticated_client):
OrderFactory.create_batch(3, user=authenticated_client.handler._force_user)
response = authenticated_client.get('/api/orders/')
assert response.status_code == 200
assert len(response.data['data']) == 3
def test_requires_authentication(self, api_client):
response = api_client.get('/api/orders/')
assert response.status_code == 401
```
---
## 7. Admin Customization (MEDIUM)
```python
class OrderItemInline(admin.TabularInline):
model = OrderItem
extra = 0
readonly_fields = ['price']
@admin.register(Order)
class OrderAdmin(admin.ModelAdmin):
list_display = ['id', 'user', 'status', 'total', 'created_at']
list_filter = ['status', 'created_at']
search_fields = ['user__email', 'id']
readonly_fields = ['id', 'created_at', 'updated_at']
inlines = [OrderItemInline]
date_hierarchy = 'created_at'
def get_queryset(self, request):
return super().get_queryset(request).select_related('user')
```
---
## 8. Production Deployment (MEDIUM)
### Security Settings
```python
# settings/prod.py
DEBUG = False
ALLOWED_HOSTS = ['example.com', 'www.example.com']
CSRF_TRUSTED_ORIGINS = ['https://example.com']
SECURE_SSL_REDIRECT = True
SESSION_COOKIE_SECURE = True
CSRF_COOKIE_SECURE = True
SECURE_HSTS_SECONDS = 31536000
```
### Deployment Stack
```
Nginx → Gunicorn → Django
PostgreSQL + Redis (cache)
Celery (background tasks)
```
```bash
gunicorn config.wsgi:application \
--bind 0.0.0.0:8000 \
--workers 4 \
--timeout 120 \
--access-logfile -
```
### WhiteNoise for Static Files
```python
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'whitenoise.middleware.WhiteNoiseMiddleware', # right after Security
...
]
STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'
```
### Rules
```
✅ Gunicorn + Nginx (or Cloud Run / Railway)
✅ PostgreSQL (not SQLite)
✅ python manage.py check --deploy
✅ Sentry for error tracking
❌ Never use runserver in production
❌ Never use DEBUG=True in production
❌ Never use SQLite in production
```
---
## Anti-Patterns
| # | ❌ Don't | ✅ Do Instead |
|---|---------|--------------|
| 1 | Business logic in views | Service layer (`services.py`) |
| 2 | One giant app | App-per-domain |
| 3 | Default User model | Custom User before first migrate |
| 4 | No `select_related` | Always eager-load related objects |
| 5 | Django fixtures for tests | `factory_boy` factories |
| 6 | `settings.py` single file | Split: base + dev + prod |
| 7 | `runserver` in production | Gunicorn + Nginx |
| 8 | SQLite in production | PostgreSQL |
| 9 | `ModelSerializer` for writes | Explicit input serializer |
| 10 | Raw SQL in views | ORM querysets + `selectors.py` |
---
## Common Issues
### Issue 1: "Can't change User model after first migration"
**Fix:** If starting fresh: delete all migrations + DB, set custom User, re-migrate. If data exists: complex migration (use `django-allauth` or incremental field migration).
### Issue 2: "Serializer is too slow on large querysets"
**Fix:** Missing `select_related` / `prefetch_related` → N+1 queries.
```python
queryset = Order.objects.select_related('user').prefetch_related('items')
```
### Issue 3: "Circular import between apps"
**Fix:** Use string references: `models.ForeignKey('orders.Order', ...)` instead of importing the model class. For services, import inside the function.

View File

@@ -0,0 +1,78 @@
# Environment & CORS Management
Patterns for managing environment variables, API URLs, and CORS configuration across frontend and backend stacks.
---
## Standard Environment Pattern
```
# .env.local (gitignored, for local dev)
NEXT_PUBLIC_API_URL=http://localhost:3001
NEXT_PUBLIC_WS_URL=ws://localhost:3001
# Staging (set in Vercel/CI)
NEXT_PUBLIC_API_URL=https://api-staging.example.com
# Production (set in Vercel/CI)
NEXT_PUBLIC_API_URL=https://api.example.com
```
---
## Environment Variable Rules
```
✅ API base URL from environment variable — NEVER hardcoded
✅ Prefix client-side vars with NEXT_PUBLIC_ (Next.js) or VITE_ (Vite)
✅ Backend URL = server-only env var (for SSR calls, not exposed to browser)
✅ CORS on backend: explicit list of allowed origins per environment
❌ Never use localhost URLs in production builds
❌ Never expose backend-only secrets with NEXT_PUBLIC_ prefix
❌ Never commit .env.local (commit .env.example with placeholders)
```
---
## CORS Configuration
```typescript
// Backend: environment-aware CORS
const ALLOWED_ORIGINS = {
development: ['http://localhost:3000', 'http://localhost:5173'],
staging: ['https://staging.example.com'],
production: ['https://example.com', 'https://www.example.com'],
};
app.use(cors({
origin: ALLOWED_ORIGINS[process.env.NODE_ENV || 'development'],
credentials: true, // needed for cookies (auth)
methods: ['GET', 'POST', 'PUT', 'PATCH', 'DELETE'],
}));
```
---
## Common Issues
### Issue 1: "CORS error in browser but works in Postman"
**Cause:** CORS is a browser security feature. Postman/curl skip it.
**Fix:**
1. Backend must return `Access-Control-Allow-Origin: https://your-frontend.com`
2. For cookies/auth: `credentials: true` on both sides
3. Check that preflight `OPTIONS` request returns correct headers
### Issue 2: "Environment variable undefined in browser"
**Cause:** Missing `NEXT_PUBLIC_` or `VITE_` prefix for client-side access.
**Fix:** Client-side vars MUST have the framework prefix. Rebuild after adding new env vars (they are embedded at build time).
### Issue 3: "Works locally, fails in staging"
**Cause:** Different origins, missing CORS config for staging domain.
**Fix:** Add staging origin to `ALLOWED_ORIGINS`, verify env vars are set in deployment platform.

View File

@@ -0,0 +1,278 @@
# Release & Acceptance Checklist
6-gate release checklist for backend and full-stack applications. Prevents "it works on my machine" and "we forgot to check X" failures.
**Iron Law: NO RELEASE WITHOUT ALL GATES PASSING.**
---
## Release Gates Overview
```
Feature Complete
Gate 1: Functional Acceptance → Does it do what it should?
Gate 2: Non-Functional Acceptance → Is it fast, reliable, observable?
Gate 3: Security Review → Is it safe?
Gate 4: Deployment Readiness → Can we deploy and rollback safely?
Gate 5: Release Execution → Deploy with canary + monitoring
Gate 6: Post-Release Validation → Did it actually work in production?
```
---
## Gate 1: Functional Acceptance
**Question: Does it do what the requirements say?**
- [ ] All acceptance criteria from ticket/PRD have passing tests
- [ ] Happy path works end-to-end
- [ ] Edge cases tested (empty inputs, max lengths, Unicode)
- [ ] Error cases tested (invalid input, not found, timeout)
- [ ] Data integrity verified (CRUD cycle produces correct state)
- [ ] Backward compatibility confirmed (existing clients not broken)
- [ ] API contract matches OpenAPI spec
- [ ] Idempotency verified (retries don't create duplicates)
### Evidence Template
| Requirement | Test | Status | Notes |
|-------------|------|--------|-------|
| User can create order | `orders.api.test:creates order` | ✅ PASS | |
| Empty cart → error | `orders.api.test:rejects empty` | ✅ PASS | |
| Payment failure handled | `payments.test:handles decline` | ✅ PASS | |
---
## Gate 2: Non-Functional Acceptance
**Question: Is it fast, reliable, and observable?**
### Performance
- [ ] Response time within budget (p95 < ___ms) — measured, not assumed
- [ ] No N+1 queries (checked with query logging)
- [ ] New queries use indexes (`EXPLAIN ANALYZE`)
- [ ] Pagination works on large datasets
- [ ] Caching effective (hit rate > 80%)
- [ ] Connection pool healthy under load
### Reliability
- [ ] Graceful degradation when dependencies fail (circuit breaker)
- [ ] Retry logic works for transient failures
- [ ] All external calls have timeouts
- [ ] Rate limiting returns 429 correctly
- [ ] Health check endpoints verified (`/health`, `/ready`)
### Observability
- [ ] Structured logging with request ID (not `console.log`)
- [ ] Metrics exposed (request count, latency, error rate)
- [ ] Alerts configured (error spike, latency spike)
- [ ] Request tracing works end-to-end
- [ ] Dashboard updated for new feature
### Evidence
| Metric | Target | Actual | Status |
|--------|--------|--------|--------|
| p95 response | < 500ms | ___ms | ✅/❌ |
| p99 response | < 1000ms | ___ms | ✅/❌ |
| Error rate (load) | < 0.1% | ___% | ✅/❌ |
| Throughput | > ___ RPS | ___ RPS | ✅/❌ |
---
## Gate 3: Security Review
**Question: Does this introduce vulnerabilities?**
### Input & Output
- [ ] All input validated server-side (never trust client)
- [ ] SQL injection prevented (parameterized queries only)
- [ ] XSS prevented (output encoding)
- [ ] File upload validated (type, size, name sanitized)
- [ ] Rate limiting on sensitive endpoints (login, reset, APIs)
### Auth & Data
- [ ] Protected endpoints require valid credentials
- [ ] Users can only access their own resources
- [ ] Admin routes require admin role
- [ ] Tokens expire (short-lived access + refresh)
- [ ] Passwords hashed (bcrypt/argon2, not MD5/SHA)
- [ ] Sensitive data not logged (passwords, tokens, PII)
- [ ] Secrets in env vars (not hardcoded)
- [ ] Error messages don't leak internals
### Dependencies
- [ ] No known vulnerabilities (`npm audit` / `pip audit` / `govulncheck`)
- [ ] Dependencies pinned in lockfile
- [ ] Unused dependencies removed
---
## Gate 4: Deployment Readiness
**Question: Can we deploy safely and roll back if needed?**
### Code
- [ ] All tests pass in CI (not "it passed locally")
- [ ] Linter clean, build succeeds
- [ ] Code reviewed and approved
- [ ] No unresolved TODO/FIXME/HACK
### Database
- [ ] Migration tested on staging with production-like data
- [ ] Down migration works (tested!)
- [ ] Migration is non-destructive (additive only)
- [ ] Migration timing estimated on production data size
- [ ] Backfill plan documented (if needed)
### Configuration
- [ ] New env vars documented in `.env.example`
- [ ] Env vars set in staging and verified
- [ ] Env vars set in production
- [ ] Feature flags configured (if applicable)
### Rollback Plan Template
```markdown
## Rollback Plan: [Feature]
### When to rollback
- Error rate > 1% sustained 5 minutes
- p99 latency > 3000ms sustained 10 minutes
- Critical business function broken
### Steps
1. Revert deploy: [command]
2. Rollback migration (if applied): [command]
3. Invalidate cache: [command]
4. Notify team: #incidents channel
5. Verify rollback: [verification steps]
### Estimated time: [X minutes]
### Data recovery: [procedure if data was modified]
```
---
## Gate 5: Release Execution
### Deployment Sequence
```
1. 📢 ANNOUNCE in release channel
2. 🗄️ DATABASE — Apply migration
- Run migration
- Verify completion
- Check data integrity
3. 🚀 DEPLOY — Roll out code
- Canary first (10% traffic)
- Monitor 5 minutes
- If OK → 50% → monitor → 100%
- If NOT OK → STOP immediately
4. 🔍 SMOKE TEST
- Health check → 200
- Login works
- Core operation works
- No error spikes
5. ✅ ANNOUNCE "Release complete. Monitoring 30 min."
```
### Canary Decision Table
| Metric | Baseline | Canary OK | STOP | ROLLBACK |
|--------|----------|-----------|------|----------|
| Error rate | 0.05% | < 0.1% | 0.5% | > 1% |
| p95 latency | 300ms | < 500ms | 700ms | > 1000ms |
---
## Gate 6: Post-Release Validation
### Immediate (0-30 min)
- [ ] Health checks green on all instances
- [ ] Error rate within normal range
- [ ] Latency normal (p95, p99)
- [ ] Core user journey manually tested
- [ ] Logs clean — no unexpected errors
- [ ] Alerts silent
### Short-term (1-24 hours)
- [ ] No customer complaints
- [ ] Business metrics stable (conversion, revenue, signups)
- [ ] Memory/CPU stable (no creeping usage)
- [ ] Queue backlogs clear
- [ ] Database performance stable
### Post-Release Report Template
```markdown
## Release Report: [Feature]
- Deployed: [timestamp] by @[engineer]
- Duration: [minutes]
| Check | Status | Notes |
|-------|--------|-------|
| Health checks | ✅ | All healthy |
| Error rate | ✅ | 0.03% (baseline: 0.05%) |
| p95 latency | ✅ | 310ms (baseline: 300ms) |
| Core flow | ✅ | Order creation verified |
Issues found: None / [details]
Rollback used: No / Yes: [reason]
```
---
## Release Readiness Score
Score each gate **0-2**: (0 = not checked, 1 = partially, 2 = fully verified with evidence)
| Gate | Score |
|------|-------|
| 1. Functional Acceptance | /2 |
| 2. Non-Functional Acceptance | /2 |
| 3. Security Review | /2 |
| 4. Deployment Readiness | /2 |
| 5. Release Execution Plan | /2 |
| 6. Post-Release Validation Plan | /2 |
| **Total** | **/12** |
**Decision:**
- **12/12** → Ship it ✅
- **10-11** → Ship with documented exceptions + owner assigned
- **< 10** → Do NOT release. Fix gaps first.
---
## Common Rationalizations
| ❌ Excuse | ✅ Reality |
|----------|-----------|
| "It's a small change" | Small changes cause outages every day |
| "We tested locally" | Local ≠ production |
| "We'll fix it if it breaks" | You'll fix it at 3 AM. Prevent now. |
| "Deadline is today" | Broken code costs more than late code |
| "CI passed" | CI doesn't check everything. Run the checklist. |
| "We can always rollback" | Only if you planned and tested rollback |
| "We did this last time fine" | Survivorship bias. Checklist every time. |

View File

@@ -0,0 +1,254 @@
# Technology Selection Framework
Structured decision framework for backend and full-stack technology choices. Prevents analysis paralysis while ensuring rigorous evaluation.
**Iron Law: NO TECHNOLOGY CHOICE WITHOUT EXPLICIT TRADE-OFF ANALYSIS.**
"I like it" and "it's trending" are not engineering arguments.
---
## Phase 1: Requirements Before Technology
### Non-Functional Requirements (Quantify!)
| Dimension | Question | Bad Answer | Good Answer |
|-----------|----------|-----------|-------------|
| Scale | How many concurrent users? | "Lots" | "1K concurrent, 500 RPS peak" |
| Latency | Acceptable p99 response time? | "Fast" | "< 200ms API, < 2s reports" |
| Availability | Required uptime? | "Always up" | "99.9% (8.7h downtime/year)" |
| Data volume | Expected storage growth? | "A lot" | "100GB/year, 10M rows" |
| Consistency | Strong vs eventual? | "Consistent" | "Strong for payments, eventual for feeds" |
| Compliance | Regulatory? | "Some" | "GDPR data residency EU, SOC 2 Type II" |
### Team Constraints
- Team size and seniority level
- What the team already knows well
- Can you hire for this stack? (check job market)
- Timeline pressure (days vs months to production)
- Budget for licenses, infrastructure, training
---
## Phase 2: Evaluation Matrix
Score each option 1-5 on weighted criteria:
| Criterion | Weight | Option A | Option B | Option C |
|-----------|--------|----------|----------|----------|
| Meets functional requirements | 5× | _ | _ | _ |
| Meets non-functional requirements | 5× | _ | _ | _ |
| Team expertise / learning curve | 4× | _ | _ | _ |
| Ecosystem maturity (libs, tools) | 3× | _ | _ | _ |
| Community & long-term viability | 3× | _ | _ | _ |
| Operational complexity | 3× | _ | _ | _ |
| Hiring pool availability | 2× | _ | _ | _ |
| Cost (license + infra + training) | 2× | _ | _ | _ |
| **Weighted Total** | | _ | _ | _ |
**Rules:**
- Any option scoring **1 on a 5× criterion** → automatically disqualified
- Options within **10%** of each other → choose what team knows best
- Options within **15%** → run a **time-boxed PoC** (2-5 days max)
---
## Phase 3: Decision Trees
### Backend Language / Framework
```
What type of project?
├─ REST/GraphQL API, rapid development
│ ├─ Team knows TypeScript → Node.js
│ │ ├─ Full-featured, enterprise patterns → NestJS
│ │ ├─ Lightweight, flexible → Fastify / Hono / Express
│ │ └─ Full-stack with React → Next.js API routes
│ ├─ Team knows Python
│ │ ├─ High-perf async API → FastAPI
│ │ ├─ Full-stack, admin-heavy → Django
│ │ └─ Lightweight → Flask / Litestar
│ └─ Team knows Java/Kotlin
│ ├─ Enterprise, large team → Spring Boot
│ └─ Lightweight, fast startup → Quarkus / Ktor
├─ High concurrency, systems-level
│ ├─ Microservices, network → Go
│ ├─ Extreme perf, safety → Rust (Axum / Actix)
│ └─ Fault tolerance → Elixir (Phoenix)
├─ Real-time (WebSocket, streaming)
│ ├─ Node.js ecosystem → Socket.io / ws
│ ├─ Scalable pub/sub → Elixir Phoenix
│ └─ Low-latency → Go / Rust
└─ ML / data-intensive
└─ Python (FastAPI + ML libs)
```
### Database
```
What data model?
├─ Structured, relational, ACID
│ ├─ General purpose → PostgreSQL ← DEFAULT CHOICE
│ ├─ Read-heavy, MySQL ecosystem → MySQL / MariaDB
│ └─ Embedded / serverless edge → SQLite / Turso / D1
├─ Semi-structured, flexible schema
│ ├─ Document-oriented → MongoDB
│ ├─ Serverless document → DynamoDB / Firestore
│ └─ Search-heavy → Elasticsearch / OpenSearch
├─ Key-value / cache
│ ├─ In-memory + data structures → Redis / Valkey
│ └─ Planet-scale KV → DynamoDB / Cassandra
├─ Time-series → TimescaleDB / ClickHouse / InfluxDB
├─ Graph → Neo4j / Apache AGE (Postgres extension)
└─ Vector (AI embeddings) → pgvector / Pinecone / Qdrant
```
**Default:** Start with PostgreSQL. It handles 80% of use cases.
### Caching Strategy
| Pattern | Technology | When |
|---------|-----------|------|
| Application cache | Redis / Valkey | Sessions, frequent reads, rate limiting |
| HTTP cache | CDN (Cloudflare/Vercel) | Static assets, public API responses |
| Query cache | Materialized views | Complex aggregations, dashboards |
| In-process cache | LRU (in-memory) | Config, small lookup tables |
| Edge cache | Cloudflare KV / Vercel KV | Global low-latency reads |
### Message Queue / Event Streaming
| Pattern | Technology | When |
|---------|-----------|------|
| Task queue (background jobs) | BullMQ / Celery / SQS | Email, exports, payments |
| Event streaming (replay, audit) | Kafka / Redpanda | Event sourcing, real-time pipelines |
| Lightweight pub/sub | Redis Streams / NATS | Simple notifications, broadcasting |
| Request-reply (sync over async) | NATS / RabbitMQ RPC | Internal service calls |
### Hosting / Deployment
| Model | Technology | When |
|-------|-----------|------|
| Serverless (auto-scale) | Vercel / Cloudflare Workers / Lambda | Variable traffic, pay-per-use |
| Container (predictable) | Cloud Run / Render / Railway / Fly.io | Steady traffic, simple ops |
| Kubernetes (large scale) | EKS / GKE / AKS | 10+ services, team has K8s expertise |
| VPS (full control) | DigitalOcean / Hetzner / EC2 | Predictable workload, cost-sensitive |
---
## Phase 4: Decision Documentation
### ADR (Architecture Decision Record) Template
```markdown
# ADR-{NNN}: {Title}
## Status: Proposed | Accepted | Deprecated | Superseded by ADR-{NNN}
## Context
What problem are we solving? What forces are at play?
## Decision
What did we choose and why?
## Evaluation
| Criterion | Weight | Chosen | Runner-up |
|-----------|--------|--------|-----------|
## Consequences
- Positive: ...
- Negative: ...
- Risks: ...
## Alternatives Rejected
- Option B: rejected because...
- Option C: rejected because...
```
---
## Common Stack Templates
### A: Startup / MVP (Speed)
| Layer | Choice | Why |
|-------|--------|-----|
| Language | TypeScript | One language front + back |
| Framework | Next.js (full-stack) or NestJS (API) | Fast iteration |
| Database | PostgreSQL (Supabase / Neon) | Managed, generous free tier |
| Auth | Better Auth / Clerk | No auth code to maintain |
| Cache | Redis (Upstash) | Serverless-friendly |
| Hosting | Vercel / Railway | Zero-config deploys |
### B: SaaS / Business App (Balance)
| Layer | Choice | Why |
|-------|--------|-----|
| Language | TypeScript or Python | Team preference |
| Framework | NestJS or FastAPI | Structured, testable |
| Database | PostgreSQL | Reliable, feature-rich |
| Queue | BullMQ (Redis) | Simple background jobs |
| Auth | OAuth 2.0 + JWT | Standard, flexible |
| Hosting | AWS ECS / Cloud Run | Scalable containers |
| Monitoring | Datadog / Grafana + Prometheus | Full observability |
### C: High-Performance (Scale)
| Layer | Choice | Why |
|-------|--------|-----|
| Language | Go or Rust | Max throughput, low latency |
| Database | PostgreSQL + Redis + ClickHouse | OLTP + cache + analytics |
| Queue | Kafka / Redpanda | High-throughput streaming |
| Hosting | Kubernetes (EKS/GKE) | Fine-grained scaling |
| Monitoring | Prometheus + Grafana + Jaeger | Metrics + tracing |
### D: AI / ML Application
| Layer | Choice | Why |
|-------|--------|-----|
| Language | Python (API) + TypeScript (frontend) | ML libs + modern UI |
| Framework | FastAPI + Next.js | Async + SSR |
| Database | PostgreSQL + pgvector | Relational + embeddings |
| Queue | Celery + Redis | ML job processing |
| Hosting | Modal / AWS GPU / Replicate | GPU access |
---
## Anti-Patterns
| # | ❌ Don't | ✅ Do Instead |
|---|---------|--------------|
| 1 | "X is trending on HN" | Evaluate against YOUR requirements |
| 2 | Resume-Driven Development | Choose what team can maintain |
| 3 | "Must scale to 1M users" (day 1) | Build for 10× current need, not 1000× |
| 4 | Evaluate for weeks | Time-box to 3-5 days, then decide |
| 5 | No decision documentation | Write ADR for every major choice |
| 6 | Ignore operational cost | Include deploy, monitor, debug cost |
| 7 | "We'll rewrite later" | Assume you won't. Choose carefully. |
| 8 | Microservices by default | Start monolith, extract when needed |
| 9 | Different DB per service (day 1) | One database, split when justified |
| 10 | "It worked at Google" | You're not Google. Scale to YOUR context. |
---
## Common Issues
### Issue 1: "Team can't agree on a framework"
**Fix:** Time-box to 3 days. Fill the evaluation matrix. If scores within 10%, pick what the majority knows. Document in ADR. Move on.
### Issue 2: "We picked X but it doesn't fit"
**Fix:** Sunk cost fallacy check. If < 2 weeks invested, switch now. If > 2 weeks, document pain points and plan phased migration.
### Issue 3: "Do we need microservices?"
**Fix:** Almost certainly no. Start with a well-structured monolith. Extract to services only when: (a) different scaling needs, (b) different team ownership, (c) different deployment cadence.

View File

@@ -0,0 +1,404 @@
# Backend Testing Strategy
Comprehensive testing guide for backend and full-stack applications. Covers the full testing pyramid with deep focus on API integration tests, database testing, contract testing, and performance testing.
## Quick Start Checklist
- [ ] **Test runner configured** (Jest/Vitest, Pytest, Go test)
- [ ] **Test database** ready (Docker container or in-memory)
- [ ] **Database isolation** per test (transaction rollback or truncation)
- [ ] **Test factories** for common entities (user, order, product)
- [ ] **Auth helper** to generate tokens for tests
- [ ] **CI pipeline** runs tests with real database service
- [ ] **Coverage threshold** enforced (≥ 80%)
---
## The Testing Pyramid
```
╱╲ E2E (few, slow) — full flows across services
╱────╲ Integration (moderate) — API + DB + external
╱────────╲ Unit (many, fast) — pure business logic
__________╲
```
| Level | What | Speed | Count |
|-------|------|-------|-------|
| Unit | Pure functions, business logic, no I/O | < 10ms | 70%+ of tests |
| Integration | API routes + real database + mocked externals | 50-500ms | ~20% |
| E2E | Full user flow across deployed services | 1-30s | ~10% |
| Contract | API compatibility between services | < 100ms | Per API boundary |
| Performance | Load, stress, soak | Minutes | Per critical path |
---
## 1. API Integration Testing (CRITICAL)
### What to Test for Every Endpoint
| Aspect | Tests to Write |
|--------|---------------|
| Happy path | Correct input → expected response + correct DB state |
| Auth | No token → 401, bad token → 401, expired → 401 |
| Authorization | Wrong role → 403, not owner → 403 |
| Validation | Missing fields → 422, bad types → 422, boundary values |
| Not found | Invalid ID → 404, deleted resource → 404 |
| Conflict | Duplicate create → 409, stale update → 409 |
| Idempotency | Same request twice → same result |
| Side effects | DB state changed, events emitted, cache invalidated |
| Error format | All errors match RFC 9457 envelope |
### TypeScript (Jest + Supertest)
```typescript
describe('POST /api/orders', () => {
let token: string;
let product: Product;
beforeAll(async () => {
await resetDatabase();
const user = await createTestUser({ role: 'customer' });
token = await getAuthToken(user);
product = await createTestProduct({ price: 29.99, stock: 10 });
});
it('creates order → 201 + correct DB state', async () => {
const res = await request(app)
.post('/api/orders')
.set('Authorization', `Bearer ${token}`)
.send({ items: [{ productId: product.id, quantity: 2 }] });
expect(res.status).toBe(201);
expect(res.body.data.total).toBe(59.98);
const updated = await db.product.findUnique({ where: { id: product.id } });
expect(updated!.stock).toBe(8);
});
it('rejects without auth → 401', async () => {
const res = await request(app).post('/api/orders').send({ items: [] });
expect(res.status).toBe(401);
});
it('rejects empty items → 422', async () => {
const res = await request(app)
.post('/api/orders')
.set('Authorization', `Bearer ${token}`)
.send({ items: [] });
expect(res.status).toBe(422);
expect(res.body.errors[0].field).toBe('items');
});
});
```
### Python (Pytest + FastAPI TestClient)
```python
@pytest.fixture
def client(db_session):
def override_get_db():
yield db_session
app.dependency_overrides[get_db] = override_get_db
yield TestClient(app)
app.dependency_overrides.clear()
def test_create_order_success(client, auth_headers, test_product):
response = client.post("/api/orders", json={
"items": [{"product_id": test_product.id, "quantity": 2}]
}, headers=auth_headers)
assert response.status_code == 201
assert response.json()["data"]["total"] == 59.98
def test_create_order_no_auth(client):
response = client.post("/api/orders", json={"items": []})
assert response.status_code == 401
def test_create_order_empty_items(client, auth_headers):
response = client.post("/api/orders", json={"items": []}, headers=auth_headers)
assert response.status_code == 422
```
---
## 2. Database Testing (HIGH)
### Test Isolation Strategies
| Strategy | Speed | Realism | When |
|----------|-------|---------|------|
| **Transaction rollback** | ⚡ Fastest | Medium | Default for unit + integration |
| **Truncation** | Fast | High | When rollback isn't possible |
| **Test containers** | Slow startup | Highest | CI pipeline, full integration |
**Transaction rollback (recommended default):**
```typescript
let tx: Transaction;
beforeEach(async () => { tx = await db.beginTransaction(); });
afterEach(async () => { await tx.rollback(); });
```
**Docker test containers (CI):**
```yaml
# docker-compose.test.yml
services:
test-db:
image: postgres:16-alpine
tmpfs: /var/lib/postgresql/data # RAM disk for speed
environment:
POSTGRES_DB: myapp_test
```
### Test Factories (Not Raw SQL)
```typescript
// factories/user.factory.ts
import { faker } from '@faker-js/faker';
export function buildUser(overrides: Partial<User> = {}): CreateUserDTO {
return {
email: faker.internet.email(),
firstName: faker.person.firstName(),
role: 'customer',
...overrides,
};
}
export async function createUser(overrides = {}) {
return db.user.create({ data: buildUser(overrides) });
}
```
```python
# factories/user_factory.py
import factory
from faker import Faker
class UserFactory(factory.Factory):
class Meta:
model = User
email = factory.LazyAttribute(lambda _: Faker().email())
first_name = factory.LazyAttribute(lambda _: Faker().first_name())
role = "customer"
```
---
## 3. External Service Testing (HIGH)
### HTTP-Level Mocking (Not Function Mocking)
**TypeScript (nock):**
```typescript
import nock from 'nock';
it('processes payment successfully', async () => {
nock('https://api.stripe.com')
.post('/v1/charges')
.reply(200, { id: 'ch_123', status: 'succeeded', amount: 5000 });
const result = await paymentService.charge({ amount: 50.00, currency: 'usd' });
expect(result.status).toBe('succeeded');
});
it('handles payment timeout', async () => {
nock('https://api.stripe.com').post('/v1/charges').delay(10000).reply(200);
await expect(paymentService.charge({ amount: 50, currency: 'usd' }))
.rejects.toThrow('timeout');
});
```
**Python (responses):**
```python
import responses
@responses.activate
def test_payment_success():
responses.post("https://api.stripe.com/v1/charges",
json={"id": "ch_123", "status": "succeeded"}, status=200)
result = payment_service.charge(amount=50.00, currency="usd")
assert result.status == "succeeded"
```
### Test Containers for Infrastructure
```typescript
import { PostgreSqlContainer } from '@testcontainers/postgresql';
import { RedisContainer } from '@testcontainers/redis';
beforeAll(async () => {
const pg = await new PostgreSqlContainer('postgres:16').start();
process.env.DATABASE_URL = pg.getConnectionUri();
await runMigrations();
}, 60000);
```
---
## 4. Contract Testing (MEDIUM-HIGH)
### Consumer-Driven Contracts (Pact)
**Consumer (OrderService calls UserService):**
```typescript
it('can fetch user by ID', async () => {
await pact.addInteraction()
.given('user usr_123 exists')
.uponReceiving('GET /users/usr_123')
.withRequest('GET', '/api/users/usr_123')
.willRespondWith(200, (b) => {
b.jsonBody({ data: { id: MatchersV3.string(), email: MatchersV3.email() } });
})
.executeTest(async (mockserver) => {
const user = await new UserClient(mockserver.url).getUser('usr_123');
expect(user.id).toBeDefined();
});
});
```
**Provider verifies in CI:**
```typescript
await new Verifier({
providerBaseUrl: 'http://localhost:3001',
pactBrokerUrl: process.env.PACT_BROKER_URL,
provider: 'UserService',
}).verifyProvider();
```
---
## 5. Performance Testing (MEDIUM)
### k6 Load Test
```javascript
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '30s', target: 20 }, // ramp up
{ duration: '1m', target: 100 }, // sustain
{ duration: '30s', target: 0 }, // ramp down
],
thresholds: {
http_req_duration: ['p(95)<500', 'p(99)<1000'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
const res = http.get(`${__ENV.BASE_URL}/api/orders`);
check(res, { 'status 200': (r) => r.status === 200 });
sleep(1);
}
```
### Performance Budgets
| Metric | Target | Action if Exceeded |
|--------|--------|--------------------|
| p95 response time | < 500ms | Optimize queries/caching |
| p99 response time | < 1000ms | Check outlier queries |
| Error rate | < 0.1% | Investigate spikes |
| DB query time | < 100ms each | Add indexes |
### When to Run
| Trigger | Test Type |
|---------|-----------|
| Before major release | Full load test |
| New DB query/index | Query benchmark |
| Infrastructure change | Baseline comparison |
| Weekly (CI) | Smoke load test |
---
## Test File Organization
```
tests/
unit/ # Pure logic, mocked dependencies
order.service.test.ts
integration/ # API + real DB
orders.api.test.ts
auth.api.test.ts
contracts/ # Consumer-driven contracts
user-service.consumer.pact.ts
performance/ # Load tests
load-test.js
fixtures/
factories/ # Test data factories
user.factory.ts
seeds/
test-data.ts
helpers/
setup.ts # Global test config
auth.helper.ts # Token generation
db.helper.ts # DB cleanup
```
---
## Anti-Patterns
| # | ❌ Don't | ✅ Do Instead |
|---|---------|--------------|
| 1 | Test only happy paths | Test errors, auth, validation, edge cases |
| 2 | Mock everything (no real DB) | Use test containers or test DB |
| 3 | Tests depend on execution order | Each test sets up / tears down own state |
| 4 | Hardcode test data | Use factories (faker + overrides) |
| 5 | Test implementation details | Test behavior: input → output |
| 6 | Share mutable state | Isolate per test (transaction rollback) |
| 7 | Skip migration testing in CI | Run migrations from scratch in CI |
| 8 | No performance test before release | Load test every major release |
| 9 | Test against production data | Generated test data only |
| 10 | Test suite > 10 minutes | Parallelize, RAM disk, optimize setup |
---
## Common Issues
### Issue 1: "Tests pass alone but fail together"
**Cause:** Shared database state between tests. Missing cleanup.
**Fix:**
```typescript
beforeEach(async () => { await db.raw('TRUNCATE orders, users CASCADE'); });
// OR use transaction rollback per test
```
### Issue 2: "Jest did not exit one second after test run"
**Cause:** Unclosed database connections or HTTP servers.
**Fix:**
```typescript
afterAll(async () => {
await db.destroy();
await server.close();
});
```
### Issue 3: "Async callback was not invoked within timeout"
**Cause:** Missing `async/await` or unhandled promise.
**Fix:**
```typescript
// ❌ Promise not awaited
it('should work', () => { request(app).get('/users'); });
// ✅ Properly awaited
it('should work', async () => { await request(app).get('/users'); });
```
### Issue 4: "Integration tests too slow in CI"
**Fix:**
1. Use `tmpfs` for PostgreSQL data dir (RAM disk)
2. Run migrations once in `beforeAll`, truncate in `beforeEach`
3. Parallelize test suites with `--maxWorkers`
4. Skip performance tests on feature branches (only main)