Initial commit: add all skills files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
278
fullstack-dev/references/release-checklist.md
Normal file
278
fullstack-dev/references/release-checklist.md
Normal file
@@ -0,0 +1,278 @@
|
||||
# Release & Acceptance Checklist
|
||||
|
||||
6-gate release checklist for backend and full-stack applications. Prevents "it works on my machine" and "we forgot to check X" failures.
|
||||
|
||||
**Iron Law: NO RELEASE WITHOUT ALL GATES PASSING.**
|
||||
|
||||
---
|
||||
|
||||
## Release Gates Overview
|
||||
|
||||
```
|
||||
Feature Complete
|
||||
↓
|
||||
Gate 1: Functional Acceptance → Does it do what it should?
|
||||
↓
|
||||
Gate 2: Non-Functional Acceptance → Is it fast, reliable, observable?
|
||||
↓
|
||||
Gate 3: Security Review → Is it safe?
|
||||
↓
|
||||
Gate 4: Deployment Readiness → Can we deploy and rollback safely?
|
||||
↓
|
||||
Gate 5: Release Execution → Deploy with canary + monitoring
|
||||
↓
|
||||
Gate 6: Post-Release Validation → Did it actually work in production?
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Gate 1: Functional Acceptance
|
||||
|
||||
**Question: Does it do what the requirements say?**
|
||||
|
||||
- [ ] All acceptance criteria from ticket/PRD have passing tests
|
||||
- [ ] Happy path works end-to-end
|
||||
- [ ] Edge cases tested (empty inputs, max lengths, Unicode)
|
||||
- [ ] Error cases tested (invalid input, not found, timeout)
|
||||
- [ ] Data integrity verified (CRUD cycle produces correct state)
|
||||
- [ ] Backward compatibility confirmed (existing clients not broken)
|
||||
- [ ] API contract matches OpenAPI spec
|
||||
- [ ] Idempotency verified (retries don't create duplicates)
|
||||
|
||||
### Evidence Template
|
||||
|
||||
| Requirement | Test | Status | Notes |
|
||||
|-------------|------|--------|-------|
|
||||
| User can create order | `orders.api.test:creates order` | ✅ PASS | |
|
||||
| Empty cart → error | `orders.api.test:rejects empty` | ✅ PASS | |
|
||||
| Payment failure handled | `payments.test:handles decline` | ✅ PASS | |
|
||||
|
||||
---
|
||||
|
||||
## Gate 2: Non-Functional Acceptance
|
||||
|
||||
**Question: Is it fast, reliable, and observable?**
|
||||
|
||||
### Performance
|
||||
|
||||
- [ ] Response time within budget (p95 < ___ms) — measured, not assumed
|
||||
- [ ] No N+1 queries (checked with query logging)
|
||||
- [ ] New queries use indexes (`EXPLAIN ANALYZE`)
|
||||
- [ ] Pagination works on large datasets
|
||||
- [ ] Caching effective (hit rate > 80%)
|
||||
- [ ] Connection pool healthy under load
|
||||
|
||||
### Reliability
|
||||
|
||||
- [ ] Graceful degradation when dependencies fail (circuit breaker)
|
||||
- [ ] Retry logic works for transient failures
|
||||
- [ ] All external calls have timeouts
|
||||
- [ ] Rate limiting returns 429 correctly
|
||||
- [ ] Health check endpoints verified (`/health`, `/ready`)
|
||||
|
||||
### Observability
|
||||
|
||||
- [ ] Structured logging with request ID (not `console.log`)
|
||||
- [ ] Metrics exposed (request count, latency, error rate)
|
||||
- [ ] Alerts configured (error spike, latency spike)
|
||||
- [ ] Request tracing works end-to-end
|
||||
- [ ] Dashboard updated for new feature
|
||||
|
||||
### Evidence
|
||||
|
||||
| Metric | Target | Actual | Status |
|
||||
|--------|--------|--------|--------|
|
||||
| p95 response | < 500ms | ___ms | ✅/❌ |
|
||||
| p99 response | < 1000ms | ___ms | ✅/❌ |
|
||||
| Error rate (load) | < 0.1% | ___% | ✅/❌ |
|
||||
| Throughput | > ___ RPS | ___ RPS | ✅/❌ |
|
||||
|
||||
---
|
||||
|
||||
## Gate 3: Security Review
|
||||
|
||||
**Question: Does this introduce vulnerabilities?**
|
||||
|
||||
### Input & Output
|
||||
|
||||
- [ ] All input validated server-side (never trust client)
|
||||
- [ ] SQL injection prevented (parameterized queries only)
|
||||
- [ ] XSS prevented (output encoding)
|
||||
- [ ] File upload validated (type, size, name sanitized)
|
||||
- [ ] Rate limiting on sensitive endpoints (login, reset, APIs)
|
||||
|
||||
### Auth & Data
|
||||
|
||||
- [ ] Protected endpoints require valid credentials
|
||||
- [ ] Users can only access their own resources
|
||||
- [ ] Admin routes require admin role
|
||||
- [ ] Tokens expire (short-lived access + refresh)
|
||||
- [ ] Passwords hashed (bcrypt/argon2, not MD5/SHA)
|
||||
- [ ] Sensitive data not logged (passwords, tokens, PII)
|
||||
- [ ] Secrets in env vars (not hardcoded)
|
||||
- [ ] Error messages don't leak internals
|
||||
|
||||
### Dependencies
|
||||
|
||||
- [ ] No known vulnerabilities (`npm audit` / `pip audit` / `govulncheck`)
|
||||
- [ ] Dependencies pinned in lockfile
|
||||
- [ ] Unused dependencies removed
|
||||
|
||||
---
|
||||
|
||||
## Gate 4: Deployment Readiness
|
||||
|
||||
**Question: Can we deploy safely and roll back if needed?**
|
||||
|
||||
### Code
|
||||
|
||||
- [ ] All tests pass in CI (not "it passed locally")
|
||||
- [ ] Linter clean, build succeeds
|
||||
- [ ] Code reviewed and approved
|
||||
- [ ] No unresolved TODO/FIXME/HACK
|
||||
|
||||
### Database
|
||||
|
||||
- [ ] Migration tested on staging with production-like data
|
||||
- [ ] Down migration works (tested!)
|
||||
- [ ] Migration is non-destructive (additive only)
|
||||
- [ ] Migration timing estimated on production data size
|
||||
- [ ] Backfill plan documented (if needed)
|
||||
|
||||
### Configuration
|
||||
|
||||
- [ ] New env vars documented in `.env.example`
|
||||
- [ ] Env vars set in staging and verified
|
||||
- [ ] Env vars set in production
|
||||
- [ ] Feature flags configured (if applicable)
|
||||
|
||||
### Rollback Plan Template
|
||||
|
||||
```markdown
|
||||
## Rollback Plan: [Feature]
|
||||
|
||||
### When to rollback
|
||||
- Error rate > 1% sustained 5 minutes
|
||||
- p99 latency > 3000ms sustained 10 minutes
|
||||
- Critical business function broken
|
||||
|
||||
### Steps
|
||||
1. Revert deploy: [command]
|
||||
2. Rollback migration (if applied): [command]
|
||||
3. Invalidate cache: [command]
|
||||
4. Notify team: #incidents channel
|
||||
5. Verify rollback: [verification steps]
|
||||
|
||||
### Estimated time: [X minutes]
|
||||
### Data recovery: [procedure if data was modified]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Gate 5: Release Execution
|
||||
|
||||
### Deployment Sequence
|
||||
|
||||
```
|
||||
1. 📢 ANNOUNCE in release channel
|
||||
|
||||
2. 🗄️ DATABASE — Apply migration
|
||||
- Run migration
|
||||
- Verify completion
|
||||
- Check data integrity
|
||||
|
||||
3. 🚀 DEPLOY — Roll out code
|
||||
- Canary first (10% traffic)
|
||||
- Monitor 5 minutes
|
||||
- If OK → 50% → monitor → 100%
|
||||
- If NOT OK → STOP immediately
|
||||
|
||||
4. 🔍 SMOKE TEST
|
||||
- Health check → 200
|
||||
- Login works
|
||||
- Core operation works
|
||||
- No error spikes
|
||||
|
||||
5. ✅ ANNOUNCE "Release complete. Monitoring 30 min."
|
||||
```
|
||||
|
||||
### Canary Decision Table
|
||||
|
||||
| Metric | Baseline | Canary OK | STOP | ROLLBACK |
|
||||
|--------|----------|-----------|------|----------|
|
||||
| Error rate | 0.05% | < 0.1% | 0.5% | > 1% |
|
||||
| p95 latency | 300ms | < 500ms | 700ms | > 1000ms |
|
||||
|
||||
---
|
||||
|
||||
## Gate 6: Post-Release Validation
|
||||
|
||||
### Immediate (0-30 min)
|
||||
|
||||
- [ ] Health checks green on all instances
|
||||
- [ ] Error rate within normal range
|
||||
- [ ] Latency normal (p95, p99)
|
||||
- [ ] Core user journey manually tested
|
||||
- [ ] Logs clean — no unexpected errors
|
||||
- [ ] Alerts silent
|
||||
|
||||
### Short-term (1-24 hours)
|
||||
|
||||
- [ ] No customer complaints
|
||||
- [ ] Business metrics stable (conversion, revenue, signups)
|
||||
- [ ] Memory/CPU stable (no creeping usage)
|
||||
- [ ] Queue backlogs clear
|
||||
- [ ] Database performance stable
|
||||
|
||||
### Post-Release Report Template
|
||||
|
||||
```markdown
|
||||
## Release Report: [Feature]
|
||||
- Deployed: [timestamp] by @[engineer]
|
||||
- Duration: [minutes]
|
||||
|
||||
| Check | Status | Notes |
|
||||
|-------|--------|-------|
|
||||
| Health checks | ✅ | All healthy |
|
||||
| Error rate | ✅ | 0.03% (baseline: 0.05%) |
|
||||
| p95 latency | ✅ | 310ms (baseline: 300ms) |
|
||||
| Core flow | ✅ | Order creation verified |
|
||||
|
||||
Issues found: None / [details]
|
||||
Rollback used: No / Yes: [reason]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Release Readiness Score
|
||||
|
||||
Score each gate **0-2**: (0 = not checked, 1 = partially, 2 = fully verified with evidence)
|
||||
|
||||
| Gate | Score |
|
||||
|------|-------|
|
||||
| 1. Functional Acceptance | /2 |
|
||||
| 2. Non-Functional Acceptance | /2 |
|
||||
| 3. Security Review | /2 |
|
||||
| 4. Deployment Readiness | /2 |
|
||||
| 5. Release Execution Plan | /2 |
|
||||
| 6. Post-Release Validation Plan | /2 |
|
||||
| **Total** | **/12** |
|
||||
|
||||
**Decision:**
|
||||
- **12/12** → Ship it ✅
|
||||
- **10-11** → Ship with documented exceptions + owner assigned
|
||||
- **< 10** → Do NOT release. Fix gaps first.
|
||||
|
||||
---
|
||||
|
||||
## Common Rationalizations
|
||||
|
||||
| ❌ Excuse | ✅ Reality |
|
||||
|----------|-----------|
|
||||
| "It's a small change" | Small changes cause outages every day |
|
||||
| "We tested locally" | Local ≠ production |
|
||||
| "We'll fix it if it breaks" | You'll fix it at 3 AM. Prevent now. |
|
||||
| "Deadline is today" | Broken code costs more than late code |
|
||||
| "CI passed" | CI doesn't check everything. Run the checklist. |
|
||||
| "We can always rollback" | Only if you planned and tested rollback |
|
||||
| "We did this last time fine" | Survivorship bias. Checklist every time. |
|
||||
Reference in New Issue
Block a user