255 lines
9.0 KiB
Markdown
255 lines
9.0 KiB
Markdown
# Technology Selection Framework
|
||
|
||
Structured decision framework for backend and full-stack technology choices. Prevents analysis paralysis while ensuring rigorous evaluation.
|
||
|
||
**Iron Law: NO TECHNOLOGY CHOICE WITHOUT EXPLICIT TRADE-OFF ANALYSIS.**
|
||
|
||
"I like it" and "it's trending" are not engineering arguments.
|
||
|
||
---
|
||
|
||
## Phase 1: Requirements Before Technology
|
||
|
||
### Non-Functional Requirements (Quantify!)
|
||
|
||
| Dimension | Question | Bad Answer | Good Answer |
|
||
|-----------|----------|-----------|-------------|
|
||
| Scale | How many concurrent users? | "Lots" | "1K concurrent, 500 RPS peak" |
|
||
| Latency | Acceptable p99 response time? | "Fast" | "< 200ms API, < 2s reports" |
|
||
| Availability | Required uptime? | "Always up" | "99.9% (8.7h downtime/year)" |
|
||
| Data volume | Expected storage growth? | "A lot" | "100GB/year, 10M rows" |
|
||
| Consistency | Strong vs eventual? | "Consistent" | "Strong for payments, eventual for feeds" |
|
||
| Compliance | Regulatory? | "Some" | "GDPR data residency EU, SOC 2 Type II" |
|
||
|
||
### Team Constraints
|
||
|
||
- Team size and seniority level
|
||
- What the team already knows well
|
||
- Can you hire for this stack? (check job market)
|
||
- Timeline pressure (days vs months to production)
|
||
- Budget for licenses, infrastructure, training
|
||
|
||
---
|
||
|
||
## Phase 2: Evaluation Matrix
|
||
|
||
Score each option 1-5 on weighted criteria:
|
||
|
||
| Criterion | Weight | Option A | Option B | Option C |
|
||
|-----------|--------|----------|----------|----------|
|
||
| Meets functional requirements | 5× | _ | _ | _ |
|
||
| Meets non-functional requirements | 5× | _ | _ | _ |
|
||
| Team expertise / learning curve | 4× | _ | _ | _ |
|
||
| Ecosystem maturity (libs, tools) | 3× | _ | _ | _ |
|
||
| Community & long-term viability | 3× | _ | _ | _ |
|
||
| Operational complexity | 3× | _ | _ | _ |
|
||
| Hiring pool availability | 2× | _ | _ | _ |
|
||
| Cost (license + infra + training) | 2× | _ | _ | _ |
|
||
| **Weighted Total** | | _ | _ | _ |
|
||
|
||
**Rules:**
|
||
- Any option scoring **1 on a 5× criterion** → automatically disqualified
|
||
- Options within **10%** of each other → choose what team knows best
|
||
- Options within **15%** → run a **time-boxed PoC** (2-5 days max)
|
||
|
||
---
|
||
|
||
## Phase 3: Decision Trees
|
||
|
||
### Backend Language / Framework
|
||
|
||
```
|
||
What type of project?
|
||
│
|
||
├─ REST/GraphQL API, rapid development
|
||
│ ├─ Team knows TypeScript → Node.js
|
||
│ │ ├─ Full-featured, enterprise patterns → NestJS
|
||
│ │ ├─ Lightweight, flexible → Fastify / Hono / Express
|
||
│ │ └─ Full-stack with React → Next.js API routes
|
||
│ ├─ Team knows Python
|
||
│ │ ├─ High-perf async API → FastAPI
|
||
│ │ ├─ Full-stack, admin-heavy → Django
|
||
│ │ └─ Lightweight → Flask / Litestar
|
||
│ └─ Team knows Java/Kotlin
|
||
│ ├─ Enterprise, large team → Spring Boot
|
||
│ └─ Lightweight, fast startup → Quarkus / Ktor
|
||
│
|
||
├─ High concurrency, systems-level
|
||
│ ├─ Microservices, network → Go
|
||
│ ├─ Extreme perf, safety → Rust (Axum / Actix)
|
||
│ └─ Fault tolerance → Elixir (Phoenix)
|
||
│
|
||
├─ Real-time (WebSocket, streaming)
|
||
│ ├─ Node.js ecosystem → Socket.io / ws
|
||
│ ├─ Scalable pub/sub → Elixir Phoenix
|
||
│ └─ Low-latency → Go / Rust
|
||
│
|
||
└─ ML / data-intensive
|
||
└─ Python (FastAPI + ML libs)
|
||
```
|
||
|
||
### Database
|
||
|
||
```
|
||
What data model?
|
||
│
|
||
├─ Structured, relational, ACID
|
||
│ ├─ General purpose → PostgreSQL ← DEFAULT CHOICE
|
||
│ ├─ Read-heavy, MySQL ecosystem → MySQL / MariaDB
|
||
│ └─ Embedded / serverless edge → SQLite / Turso / D1
|
||
│
|
||
├─ Semi-structured, flexible schema
|
||
│ ├─ Document-oriented → MongoDB
|
||
│ ├─ Serverless document → DynamoDB / Firestore
|
||
│ └─ Search-heavy → Elasticsearch / OpenSearch
|
||
│
|
||
├─ Key-value / cache
|
||
│ ├─ In-memory + data structures → Redis / Valkey
|
||
│ └─ Planet-scale KV → DynamoDB / Cassandra
|
||
│
|
||
├─ Time-series → TimescaleDB / ClickHouse / InfluxDB
|
||
├─ Graph → Neo4j / Apache AGE (Postgres extension)
|
||
└─ Vector (AI embeddings) → pgvector / Pinecone / Qdrant
|
||
```
|
||
|
||
**Default:** Start with PostgreSQL. It handles 80% of use cases.
|
||
|
||
### Caching Strategy
|
||
|
||
| Pattern | Technology | When |
|
||
|---------|-----------|------|
|
||
| Application cache | Redis / Valkey | Sessions, frequent reads, rate limiting |
|
||
| HTTP cache | CDN (Cloudflare/Vercel) | Static assets, public API responses |
|
||
| Query cache | Materialized views | Complex aggregations, dashboards |
|
||
| In-process cache | LRU (in-memory) | Config, small lookup tables |
|
||
| Edge cache | Cloudflare KV / Vercel KV | Global low-latency reads |
|
||
|
||
### Message Queue / Event Streaming
|
||
|
||
| Pattern | Technology | When |
|
||
|---------|-----------|------|
|
||
| Task queue (background jobs) | BullMQ / Celery / SQS | Email, exports, payments |
|
||
| Event streaming (replay, audit) | Kafka / Redpanda | Event sourcing, real-time pipelines |
|
||
| Lightweight pub/sub | Redis Streams / NATS | Simple notifications, broadcasting |
|
||
| Request-reply (sync over async) | NATS / RabbitMQ RPC | Internal service calls |
|
||
|
||
### Hosting / Deployment
|
||
|
||
| Model | Technology | When |
|
||
|-------|-----------|------|
|
||
| Serverless (auto-scale) | Vercel / Cloudflare Workers / Lambda | Variable traffic, pay-per-use |
|
||
| Container (predictable) | Cloud Run / Render / Railway / Fly.io | Steady traffic, simple ops |
|
||
| Kubernetes (large scale) | EKS / GKE / AKS | 10+ services, team has K8s expertise |
|
||
| VPS (full control) | DigitalOcean / Hetzner / EC2 | Predictable workload, cost-sensitive |
|
||
|
||
---
|
||
|
||
## Phase 4: Decision Documentation
|
||
|
||
### ADR (Architecture Decision Record) Template
|
||
|
||
```markdown
|
||
# ADR-{NNN}: {Title}
|
||
|
||
## Status: Proposed | Accepted | Deprecated | Superseded by ADR-{NNN}
|
||
|
||
## Context
|
||
What problem are we solving? What forces are at play?
|
||
|
||
## Decision
|
||
What did we choose and why?
|
||
|
||
## Evaluation
|
||
| Criterion | Weight | Chosen | Runner-up |
|
||
|-----------|--------|--------|-----------|
|
||
|
||
## Consequences
|
||
- Positive: ...
|
||
- Negative: ...
|
||
- Risks: ...
|
||
|
||
## Alternatives Rejected
|
||
- Option B: rejected because...
|
||
- Option C: rejected because...
|
||
```
|
||
|
||
---
|
||
|
||
## Common Stack Templates
|
||
|
||
### A: Startup / MVP (Speed)
|
||
|
||
| Layer | Choice | Why |
|
||
|-------|--------|-----|
|
||
| Language | TypeScript | One language front + back |
|
||
| Framework | Next.js (full-stack) or NestJS (API) | Fast iteration |
|
||
| Database | PostgreSQL (Supabase / Neon) | Managed, generous free tier |
|
||
| Auth | Better Auth / Clerk | No auth code to maintain |
|
||
| Cache | Redis (Upstash) | Serverless-friendly |
|
||
| Hosting | Vercel / Railway | Zero-config deploys |
|
||
|
||
### B: SaaS / Business App (Balance)
|
||
|
||
| Layer | Choice | Why |
|
||
|-------|--------|-----|
|
||
| Language | TypeScript or Python | Team preference |
|
||
| Framework | NestJS or FastAPI | Structured, testable |
|
||
| Database | PostgreSQL | Reliable, feature-rich |
|
||
| Queue | BullMQ (Redis) | Simple background jobs |
|
||
| Auth | OAuth 2.0 + JWT | Standard, flexible |
|
||
| Hosting | AWS ECS / Cloud Run | Scalable containers |
|
||
| Monitoring | Datadog / Grafana + Prometheus | Full observability |
|
||
|
||
### C: High-Performance (Scale)
|
||
|
||
| Layer | Choice | Why |
|
||
|-------|--------|-----|
|
||
| Language | Go or Rust | Max throughput, low latency |
|
||
| Database | PostgreSQL + Redis + ClickHouse | OLTP + cache + analytics |
|
||
| Queue | Kafka / Redpanda | High-throughput streaming |
|
||
| Hosting | Kubernetes (EKS/GKE) | Fine-grained scaling |
|
||
| Monitoring | Prometheus + Grafana + Jaeger | Metrics + tracing |
|
||
|
||
### D: AI / ML Application
|
||
|
||
| Layer | Choice | Why |
|
||
|-------|--------|-----|
|
||
| Language | Python (API) + TypeScript (frontend) | ML libs + modern UI |
|
||
| Framework | FastAPI + Next.js | Async + SSR |
|
||
| Database | PostgreSQL + pgvector | Relational + embeddings |
|
||
| Queue | Celery + Redis | ML job processing |
|
||
| Hosting | Modal / AWS GPU / Replicate | GPU access |
|
||
|
||
---
|
||
|
||
## Anti-Patterns
|
||
|
||
| # | ❌ Don't | ✅ Do Instead |
|
||
|---|---------|--------------|
|
||
| 1 | "X is trending on HN" | Evaluate against YOUR requirements |
|
||
| 2 | Resume-Driven Development | Choose what team can maintain |
|
||
| 3 | "Must scale to 1M users" (day 1) | Build for 10× current need, not 1000× |
|
||
| 4 | Evaluate for weeks | Time-box to 3-5 days, then decide |
|
||
| 5 | No decision documentation | Write ADR for every major choice |
|
||
| 6 | Ignore operational cost | Include deploy, monitor, debug cost |
|
||
| 7 | "We'll rewrite later" | Assume you won't. Choose carefully. |
|
||
| 8 | Microservices by default | Start monolith, extract when needed |
|
||
| 9 | Different DB per service (day 1) | One database, split when justified |
|
||
| 10 | "It worked at Google" | You're not Google. Scale to YOUR context. |
|
||
|
||
---
|
||
|
||
## Common Issues
|
||
|
||
### Issue 1: "Team can't agree on a framework"
|
||
|
||
**Fix:** Time-box to 3 days. Fill the evaluation matrix. If scores within 10%, pick what the majority knows. Document in ADR. Move on.
|
||
|
||
### Issue 2: "We picked X but it doesn't fit"
|
||
|
||
**Fix:** Sunk cost fallacy check. If < 2 weeks invested, switch now. If > 2 weeks, document pain points and plan phased migration.
|
||
|
||
### Issue 3: "Do we need microservices?"
|
||
|
||
**Fix:** Almost certainly no. Start with a well-structured monolith. Extract to services only when: (a) different scaling needs, (b) different team ownership, (c) different deployment cadence.
|