Files
skills/fullstack-dev/references/technology-selection.md
shihao 6487becf60 Initial commit: add all skills files
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 16:52:49 +08:00

9.0 KiB
Raw Permalink Blame History

Technology Selection Framework

Structured decision framework for backend and full-stack technology choices. Prevents analysis paralysis while ensuring rigorous evaluation.

Iron Law: NO TECHNOLOGY CHOICE WITHOUT EXPLICIT TRADE-OFF ANALYSIS.

"I like it" and "it's trending" are not engineering arguments.


Phase 1: Requirements Before Technology

Non-Functional Requirements (Quantify!)

Dimension Question Bad Answer Good Answer
Scale How many concurrent users? "Lots" "1K concurrent, 500 RPS peak"
Latency Acceptable p99 response time? "Fast" "< 200ms API, < 2s reports"
Availability Required uptime? "Always up" "99.9% (8.7h downtime/year)"
Data volume Expected storage growth? "A lot" "100GB/year, 10M rows"
Consistency Strong vs eventual? "Consistent" "Strong for payments, eventual for feeds"
Compliance Regulatory? "Some" "GDPR data residency EU, SOC 2 Type II"

Team Constraints

  • Team size and seniority level
  • What the team already knows well
  • Can you hire for this stack? (check job market)
  • Timeline pressure (days vs months to production)
  • Budget for licenses, infrastructure, training

Phase 2: Evaluation Matrix

Score each option 1-5 on weighted criteria:

Criterion Weight Option A Option B Option C
Meets functional requirements 5× _ _ _
Meets non-functional requirements 5× _ _ _
Team expertise / learning curve 4× _ _ _
Ecosystem maturity (libs, tools) 3× _ _ _
Community & long-term viability 3× _ _ _
Operational complexity 3× _ _ _
Hiring pool availability 2× _ _ _
Cost (license + infra + training) 2× _ _ _
Weighted Total _ _ _

Rules:

  • Any option scoring 1 on a 5× criterion → automatically disqualified
  • Options within 10% of each other → choose what team knows best
  • Options within 15% → run a time-boxed PoC (2-5 days max)

Phase 3: Decision Trees

Backend Language / Framework

What type of project?
│
├─ REST/GraphQL API, rapid development
│   ├─ Team knows TypeScript → Node.js
│   │   ├─ Full-featured, enterprise patterns → NestJS
│   │   ├─ Lightweight, flexible → Fastify / Hono / Express
│   │   └─ Full-stack with React → Next.js API routes
│   ├─ Team knows Python
│   │   ├─ High-perf async API → FastAPI
│   │   ├─ Full-stack, admin-heavy → Django
│   │   └─ Lightweight → Flask / Litestar
│   └─ Team knows Java/Kotlin
│       ├─ Enterprise, large team → Spring Boot
│       └─ Lightweight, fast startup → Quarkus / Ktor
│
├─ High concurrency, systems-level
│   ├─ Microservices, network → Go
│   ├─ Extreme perf, safety → Rust (Axum / Actix)
│   └─ Fault tolerance → Elixir (Phoenix)
│
├─ Real-time (WebSocket, streaming)
│   ├─ Node.js ecosystem → Socket.io / ws
│   ├─ Scalable pub/sub → Elixir Phoenix
│   └─ Low-latency → Go / Rust
│
└─ ML / data-intensive
    └─ Python (FastAPI + ML libs)

Database

What data model?
│
├─ Structured, relational, ACID
│   ├─ General purpose → PostgreSQL ← DEFAULT CHOICE
│   ├─ Read-heavy, MySQL ecosystem → MySQL / MariaDB
│   └─ Embedded / serverless edge → SQLite / Turso / D1
│
├─ Semi-structured, flexible schema
│   ├─ Document-oriented → MongoDB
│   ├─ Serverless document → DynamoDB / Firestore
│   └─ Search-heavy → Elasticsearch / OpenSearch
│
├─ Key-value / cache
│   ├─ In-memory + data structures → Redis / Valkey
│   └─ Planet-scale KV → DynamoDB / Cassandra
│
├─ Time-series → TimescaleDB / ClickHouse / InfluxDB
├─ Graph → Neo4j / Apache AGE (Postgres extension)
└─ Vector (AI embeddings) → pgvector / Pinecone / Qdrant

Default: Start with PostgreSQL. It handles 80% of use cases.

Caching Strategy

Pattern Technology When
Application cache Redis / Valkey Sessions, frequent reads, rate limiting
HTTP cache CDN (Cloudflare/Vercel) Static assets, public API responses
Query cache Materialized views Complex aggregations, dashboards
In-process cache LRU (in-memory) Config, small lookup tables
Edge cache Cloudflare KV / Vercel KV Global low-latency reads

Message Queue / Event Streaming

Pattern Technology When
Task queue (background jobs) BullMQ / Celery / SQS Email, exports, payments
Event streaming (replay, audit) Kafka / Redpanda Event sourcing, real-time pipelines
Lightweight pub/sub Redis Streams / NATS Simple notifications, broadcasting
Request-reply (sync over async) NATS / RabbitMQ RPC Internal service calls

Hosting / Deployment

Model Technology When
Serverless (auto-scale) Vercel / Cloudflare Workers / Lambda Variable traffic, pay-per-use
Container (predictable) Cloud Run / Render / Railway / Fly.io Steady traffic, simple ops
Kubernetes (large scale) EKS / GKE / AKS 10+ services, team has K8s expertise
VPS (full control) DigitalOcean / Hetzner / EC2 Predictable workload, cost-sensitive

Phase 4: Decision Documentation

ADR (Architecture Decision Record) Template

# ADR-{NNN}: {Title}

## Status: Proposed | Accepted | Deprecated | Superseded by ADR-{NNN}

## Context
What problem are we solving? What forces are at play?

## Decision
What did we choose and why?

## Evaluation
| Criterion | Weight | Chosen | Runner-up |
|-----------|--------|--------|-----------|

## Consequences
- Positive: ...
- Negative: ...
- Risks: ...

## Alternatives Rejected
- Option B: rejected because...
- Option C: rejected because...

Common Stack Templates

A: Startup / MVP (Speed)

Layer Choice Why
Language TypeScript One language front + back
Framework Next.js (full-stack) or NestJS (API) Fast iteration
Database PostgreSQL (Supabase / Neon) Managed, generous free tier
Auth Better Auth / Clerk No auth code to maintain
Cache Redis (Upstash) Serverless-friendly
Hosting Vercel / Railway Zero-config deploys

B: SaaS / Business App (Balance)

Layer Choice Why
Language TypeScript or Python Team preference
Framework NestJS or FastAPI Structured, testable
Database PostgreSQL Reliable, feature-rich
Queue BullMQ (Redis) Simple background jobs
Auth OAuth 2.0 + JWT Standard, flexible
Hosting AWS ECS / Cloud Run Scalable containers
Monitoring Datadog / Grafana + Prometheus Full observability

C: High-Performance (Scale)

Layer Choice Why
Language Go or Rust Max throughput, low latency
Database PostgreSQL + Redis + ClickHouse OLTP + cache + analytics
Queue Kafka / Redpanda High-throughput streaming
Hosting Kubernetes (EKS/GKE) Fine-grained scaling
Monitoring Prometheus + Grafana + Jaeger Metrics + tracing

D: AI / ML Application

Layer Choice Why
Language Python (API) + TypeScript (frontend) ML libs + modern UI
Framework FastAPI + Next.js Async + SSR
Database PostgreSQL + pgvector Relational + embeddings
Queue Celery + Redis ML job processing
Hosting Modal / AWS GPU / Replicate GPU access

Anti-Patterns

# Don't Do Instead
1 "X is trending on HN" Evaluate against YOUR requirements
2 Resume-Driven Development Choose what team can maintain
3 "Must scale to 1M users" (day 1) Build for 10× current need, not 1000×
4 Evaluate for weeks Time-box to 3-5 days, then decide
5 No decision documentation Write ADR for every major choice
6 Ignore operational cost Include deploy, monitor, debug cost
7 "We'll rewrite later" Assume you won't. Choose carefully.
8 Microservices by default Start monolith, extract when needed
9 Different DB per service (day 1) One database, split when justified
10 "It worked at Google" You're not Google. Scale to YOUR context.

Common Issues

Issue 1: "Team can't agree on a framework"

Fix: Time-box to 3 days. Fill the evaluation matrix. If scores within 10%, pick what the majority knows. Document in ADR. Move on.

Issue 2: "We picked X but it doesn't fit"

Fix: Sunk cost fallacy check. If < 2 weeks invested, switch now. If > 2 weeks, document pain points and plan phased migration.

Issue 3: "Do we need microservices?"

Fix: Almost certainly no. Start with a well-structured monolith. Extract to services only when: (a) different scaling needs, (b) different team ownership, (c) different deployment cadence.