Metal Shader Reference
Expert reference for Metal shaders, real-time rendering, and Apple's Tile-Based Deferred Rendering (TBDR) architecture.
Core Principles
Half precision first → Leverage TBDR → Function constant specialization → Use Intersector API
When to Use
- Metal Shading Language (MSL) development
- Apple GPU optimization (TBDR architecture)
- PBR rendering pipelines
- Compute shaders and parallel processing
- Apple Silicon ray tracing
- GPU profiling and debugging
When NOT to Use
- WebGL/GLSL (different architecture)
- CUDA (NVIDIA only)
- OpenGL (deprecated on Apple)
- CPU-side optimization
Expert vs Novice
| Topic |
Novice |
Expert |
| Data types |
float everywhere |
Default half, float only for position/depth |
| Branching |
Runtime conditionals |
Function constants for compile-time elimination |
| Memory |
Everything in device |
Know constant/device/threadgroup tradeoffs |
| Architecture |
Treat as desktop GPU |
Understand TBDR: tile memory is free, bandwidth is expensive |
| Ray tracing |
intersection queries |
intersector API (hardware-aligned) |
| Debugging |
print debugging |
GPU capture, shader profiler, occupancy analysis |
Common Anti-Patterns
| Anti-Pattern |
Problem |
Solution |
| 32-bit floats |
Wastes registers, reduces occupancy, doubles bandwidth |
Default half, float only for position/depth |
| Ignoring TBDR |
Not using free tile memory |
Use [[color(n)]], memoryless targets |
| Runtime constant branches |
Warp divergence, wastes ALU |
Function constants + pipeline specialization |
| intersection queries |
Not hardware-aligned |
Use intersector API |
Metal Evolution
| Era |
Key Development |
| Metal 2.x |
OpenGL migration, basic compute |
| Apple Silicon |
Unified memory, tile shaders critical |
| Metal 3 |
Mesh shaders, hardware-accelerated ray tracing |
| Latest |
Neural Engine + GPU cooperation, Vision Pro foveated rendering |
Apple Family 9 Note: Threadgroup memory less advantageous vs direct device access.
Shader Types
| Type |
Purpose |
Key Attributes |
| Vertex |
Vertex transformation |
[[stage_in]], [[buffer(n)]] |
| Fragment |
Pixel shading |
[[color(n)]], [[texture(n)]] |
| Compute/Kernel |
General computation |
[[thread_position_in_grid]] |
| Tile |
TBDR-specific |
[[imageblock]] |
| Mesh |
Metal 3 geometry |
[[mesh_id]] |
Rendering Techniques
| Technique |
Description |
| Fullscreen quad |
4 vertex triangle strip, no MVP, post-processing basis |
| PBR Cook-Torrance |
Fresnel Schlick + GGX Distribution + Smith Geometry |
| Blinn-Phong |
Simple specular, half-vector calculation |
Procedural Generation
| Technique |
Use Case |
| Hash functions |
Pseudo-random basis for noise, random sampling |
| Voronoi |
Cell textures, stones, cracks |
| Value/Perlin Noise |
Continuous random fields |
| FBM |
Multi-octave layering, fractal terrain, clouds |
| Domain Warping |
Coordinate distortion, organic shapes |
Numerical Techniques
| Technique |
Formula |
| Central difference gradient |
(f(x+h) - f(x-h)) / (2h) |
| Smoothstep |
x * x * (3 - 2 * x) |
| SDF operations |
min/max/smooth_min boolean ops |
SwiftUI + MTKView Integration
Architecture Pattern
Uniform Alignment Rules
| Swift Type |
Metal Type |
Alignment |
Float |
float |
4 bytes |
SIMD2<Float> |
float2 |
8 bytes |
SIMD3<Float> |
float3 |
16 bytes |
SIMD4<Float> |
float4 |
16 bytes |
Key: float3 aligns to 16 bytes. Use MemoryLayout<T>.size to verify.
Command Line Tools
| Command |
Purpose |
xcrun metal -c shader.metal -o shader.air |
Compile to AIR |
xcrun metallib shader.air -o shader.metallib |
Link to metallib |
xcrun metal shader.metal -o shader.metallib |
One-step compile & link |
xcrun metal -Weverything -c shader.metal |
Syntax check |
xcrun metal-objdump --disassemble shader.metallib |
Disassemble |
GPU Debugging
Xcode Workflow
- GPU Capture: ⌘⇧⌥G
- Shader Profiler: Select draw call → View Shader
- Memory Viewer: Inspect buffer/texture
- Performance HUD: Enable in device options
Key Metrics
| Metric |
Healthy Value |
Low Value Cause |
| GPU Occupancy |
> 80% |
Memory bandwidth bottleneck |
| ALU Utilization |
> 60% |
Waiting on memory |
| Bandwidth |
As low as possible |
TBDR should minimize store |
Debug Utility Functions
| Function |
Purpose |
| heatmap |
Value visualization (blue→green→red) |
| debugNaN |
NaN/Inf detection (magenta marker) |
| visualizeDepth |
Linearized depth visualization |
Performance Optimization Checklist
Data Types
Memory Management
Branch Optimization
Rendering Tips
Compute Optimization
Metal, Apple Silicon, and Xcode are trademarks of Apple Inc.