Files

shihao 6487becf60 Initial commit: add all skills files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-10 16:52:49 +08:00

9.0 KiB

Raw Blame History

Technology Selection Framework

Structured decision framework for backend and full-stack technology choices. Prevents analysis paralysis while ensuring rigorous evaluation.

Iron Law: NO TECHNOLOGY CHOICE WITHOUT EXPLICIT TRADE-OFF ANALYSIS.

"I like it" and "it's trending" are not engineering arguments.

Phase 1: Requirements Before Technology

Non-Functional Requirements (Quantify!)

Dimension	Question	Bad Answer	Good Answer
Scale	How many concurrent users?	"Lots"	"1K concurrent, 500 RPS peak"
Latency	Acceptable p99 response time?	"Fast"	"< 200ms API, < 2s reports"
Availability	Required uptime?	"Always up"	"99.9% (8.7h downtime/year)"
Data volume	Expected storage growth?	"A lot"	"100GB/year, 10M rows"
Consistency	Strong vs eventual?	"Consistent"	"Strong for payments, eventual for feeds"
Compliance	Regulatory?	"Some"	"GDPR data residency EU, SOC 2 Type II"

Team Constraints

Team size and seniority level
What the team already knows well
Can you hire for this stack? (check job market)
Timeline pressure (days vs months to production)
Budget for licenses, infrastructure, training

Phase 2: Evaluation Matrix

Score each option 1-5 on weighted criteria:

Criterion	Weight	Option A	Option B	Option C
Meets functional requirements	5×	_	_	_
Meets non-functional requirements	5×	_	_	_
Team expertise / learning curve	4×	_	_	_
Ecosystem maturity (libs, tools)	3×	_	_	_
Community & long-term viability	3×	_	_	_
Operational complexity	3×	_	_	_
Hiring pool availability	2×	_	_	_
Cost (license + infra + training)	2×	_	_	_
Weighted Total		_	_	_

Rules:

Any option scoring 1 on a 5× criterion → automatically disqualified
Options within 10% of each other → choose what team knows best
Options within 15% → run a time-boxed PoC (2-5 days max)

Phase 3: Decision Trees

Backend Language / Framework

What type of project?
│
├─ REST/GraphQL API, rapid development
│   ├─ Team knows TypeScript → Node.js
│   │   ├─ Full-featured, enterprise patterns → NestJS
│   │   ├─ Lightweight, flexible → Fastify / Hono / Express
│   │   └─ Full-stack with React → Next.js API routes
│   ├─ Team knows Python
│   │   ├─ High-perf async API → FastAPI
│   │   ├─ Full-stack, admin-heavy → Django
│   │   └─ Lightweight → Flask / Litestar
│   └─ Team knows Java/Kotlin
│       ├─ Enterprise, large team → Spring Boot
│       └─ Lightweight, fast startup → Quarkus / Ktor
│
├─ High concurrency, systems-level
│   ├─ Microservices, network → Go
│   ├─ Extreme perf, safety → Rust (Axum / Actix)
│   └─ Fault tolerance → Elixir (Phoenix)
│
├─ Real-time (WebSocket, streaming)
│   ├─ Node.js ecosystem → Socket.io / ws
│   ├─ Scalable pub/sub → Elixir Phoenix
│   └─ Low-latency → Go / Rust
│
└─ ML / data-intensive
    └─ Python (FastAPI + ML libs)

Database

What data model?
│
├─ Structured, relational, ACID
│   ├─ General purpose → PostgreSQL ← DEFAULT CHOICE
│   ├─ Read-heavy, MySQL ecosystem → MySQL / MariaDB
│   └─ Embedded / serverless edge → SQLite / Turso / D1
│
├─ Semi-structured, flexible schema
│   ├─ Document-oriented → MongoDB
│   ├─ Serverless document → DynamoDB / Firestore
│   └─ Search-heavy → Elasticsearch / OpenSearch
│
├─ Key-value / cache
│   ├─ In-memory + data structures → Redis / Valkey
│   └─ Planet-scale KV → DynamoDB / Cassandra
│
├─ Time-series → TimescaleDB / ClickHouse / InfluxDB
├─ Graph → Neo4j / Apache AGE (Postgres extension)
└─ Vector (AI embeddings) → pgvector / Pinecone / Qdrant

Default: Start with PostgreSQL. It handles 80% of use cases.

Caching Strategy

Pattern	Technology	When
Application cache	Redis / Valkey	Sessions, frequent reads, rate limiting
HTTP cache	CDN (Cloudflare/Vercel)	Static assets, public API responses
Query cache	Materialized views	Complex aggregations, dashboards
In-process cache	LRU (in-memory)	Config, small lookup tables
Edge cache	Cloudflare KV / Vercel KV	Global low-latency reads

Message Queue / Event Streaming

Pattern	Technology	When
Task queue (background jobs)	BullMQ / Celery / SQS	Email, exports, payments
Event streaming (replay, audit)	Kafka / Redpanda	Event sourcing, real-time pipelines
Lightweight pub/sub	Redis Streams / NATS	Simple notifications, broadcasting
Request-reply (sync over async)	NATS / RabbitMQ RPC	Internal service calls

Hosting / Deployment

Model	Technology	When
Serverless (auto-scale)	Vercel / Cloudflare Workers / Lambda	Variable traffic, pay-per-use
Container (predictable)	Cloud Run / Render / Railway / Fly.io	Steady traffic, simple ops
Kubernetes (large scale)	EKS / GKE / AKS	10+ services, team has K8s expertise
VPS (full control)	DigitalOcean / Hetzner / EC2	Predictable workload, cost-sensitive

Phase 4: Decision Documentation

ADR (Architecture Decision Record) Template

# ADR-{NNN}: {Title}

## Status: Proposed | Accepted | Deprecated | Superseded by ADR-{NNN}

## Context
What problem are we solving? What forces are at play?

## Decision
What did we choose and why?

## Evaluation
| Criterion | Weight | Chosen | Runner-up |
|-----------|--------|--------|-----------|

## Consequences
- Positive: ...
- Negative: ...
- Risks: ...

## Alternatives Rejected
- Option B: rejected because...
- Option C: rejected because...

Common Stack Templates

A: Startup / MVP (Speed)

Layer	Choice	Why
Language	TypeScript	One language front + back
Framework	Next.js (full-stack) or NestJS (API)	Fast iteration
Database	PostgreSQL (Supabase / Neon)	Managed, generous free tier
Auth	Better Auth / Clerk	No auth code to maintain
Cache	Redis (Upstash)	Serverless-friendly
Hosting	Vercel / Railway	Zero-config deploys

B: SaaS / Business App (Balance)

Layer	Choice	Why
Language	TypeScript or Python	Team preference
Framework	NestJS or FastAPI	Structured, testable
Database	PostgreSQL	Reliable, feature-rich
Queue	BullMQ (Redis)	Simple background jobs
Auth	OAuth 2.0 + JWT	Standard, flexible
Hosting	AWS ECS / Cloud Run	Scalable containers
Monitoring	Datadog / Grafana + Prometheus	Full observability

C: High-Performance (Scale)

Layer	Choice	Why
Language	Go or Rust	Max throughput, low latency
Database	PostgreSQL + Redis + ClickHouse	OLTP + cache + analytics
Queue	Kafka / Redpanda	High-throughput streaming
Hosting	Kubernetes (EKS/GKE)	Fine-grained scaling
Monitoring	Prometheus + Grafana + Jaeger	Metrics + tracing

D: AI / ML Application

Layer	Choice	Why
Language	Python (API) + TypeScript (frontend)	ML libs + modern UI
Framework	FastAPI + Next.js	Async + SSR
Database	PostgreSQL + pgvector	Relational + embeddings
Queue	Celery + Redis	ML job processing
Hosting	Modal / AWS GPU / Replicate	GPU access

Anti-Patterns

#	❌ Don't	✅ Do Instead
1	"X is trending on HN"	Evaluate against YOUR requirements
2	Resume-Driven Development	Choose what team can maintain
3	"Must scale to 1M users" (day 1)	Build for 10× current need, not 1000×
4	Evaluate for weeks	Time-box to 3-5 days, then decide
5	No decision documentation	Write ADR for every major choice
6	Ignore operational cost	Include deploy, monitor, debug cost
7	"We'll rewrite later"	Assume you won't. Choose carefully.
8	Microservices by default	Start monolith, extract when needed
9	Different DB per service (day 1)	One database, split when justified
10	"It worked at Google"	You're not Google. Scale to YOUR context.

Common Issues

Issue 1: "Team can't agree on a framework"

Fix: Time-box to 3 days. Fill the evaluation matrix. If scores within 10%, pick what the majority knows. Document in ADR. Move on.

Issue 2: "We picked X but it doesn't fit"

Fix: Sunk cost fallacy check. If < 2 weeks invested, switch now. If > 2 weeks, document pain points and plan phased migration.

Issue 3: "Do we need microservices?"

Fix: Almost certainly no. Start with a well-structured monolith. Extract to services only when: (a) different scaling needs, (b) different team ownership, (c) different deployment cadence.

9.0 KiB Raw Blame History Unescape Escape