# Metal Shader Reference Expert reference for Metal shaders, real-time rendering, and Apple's Tile-Based Deferred Rendering (TBDR) architecture. ## Core Principles **Half precision first → Leverage TBDR → Function constant specialization → Use Intersector API** ### When to Use - Metal Shading Language (MSL) development - Apple GPU optimization (TBDR architecture) - PBR rendering pipelines - Compute shaders and parallel processing - Apple Silicon ray tracing - GPU profiling and debugging ### When NOT to Use - WebGL/GLSL (different architecture) - CUDA (NVIDIA only) - OpenGL (deprecated on Apple) - CPU-side optimization ## Expert vs Novice | Topic | Novice | Expert | |-------|--------|--------| | Data types | `float` everywhere | Default `half`, `float` only for position/depth | | Branching | Runtime conditionals | Function constants for compile-time elimination | | Memory | Everything in device | Know constant/device/threadgroup tradeoffs | | Architecture | Treat as desktop GPU | Understand TBDR: tile memory is free, bandwidth is expensive | | Ray tracing | intersection queries | intersector API (hardware-aligned) | | Debugging | print debugging | GPU capture, shader profiler, occupancy analysis | ## Common Anti-Patterns | Anti-Pattern | Problem | Solution | |--------------|---------|----------| | 32-bit floats | Wastes registers, reduces occupancy, doubles bandwidth | Default `half`, `float` only for position/depth | | Ignoring TBDR | Not using free tile memory | Use `[[color(n)]]`, memoryless targets | | Runtime constant branches | Warp divergence, wastes ALU | Function constants + pipeline specialization | | intersection queries | Not hardware-aligned | Use intersector API | ## Metal Evolution | Era | Key Development | |-----|-----------------| | Metal 2.x | OpenGL migration, basic compute | | Apple Silicon | Unified memory, tile shaders critical | | Metal 3 | Mesh shaders, hardware-accelerated ray tracing | | Latest | Neural Engine + GPU cooperation, Vision Pro foveated rendering | **Apple Family 9 Note**: Threadgroup memory less advantageous vs direct device access. ## Shader Types | Type | Purpose | Key Attributes | |------|---------|----------------| | Vertex | Vertex transformation | `[[stage_in]]`, `[[buffer(n)]]` | | Fragment | Pixel shading | `[[color(n)]]`, `[[texture(n)]]` | | Compute/Kernel | General computation | `[[thread_position_in_grid]]` | | Tile | TBDR-specific | `[[imageblock]]` | | Mesh | Metal 3 geometry | `[[mesh_id]]` | ## Rendering Techniques | Technique | Description | |-----------|-------------| | Fullscreen quad | 4 vertex triangle strip, no MVP, post-processing basis | | PBR Cook-Torrance | Fresnel Schlick + GGX Distribution + Smith Geometry | | Blinn-Phong | Simple specular, half-vector calculation | ## Procedural Generation | Technique | Use Case | |-----------|----------| | Hash functions | Pseudo-random basis for noise, random sampling | | Voronoi | Cell textures, stones, cracks | | Value/Perlin Noise | Continuous random fields | | FBM | Multi-octave layering, fractal terrain, clouds | | Domain Warping | Coordinate distortion, organic shapes | ## Numerical Techniques | Technique | Formula | |-----------|---------| | Central difference gradient | `(f(x+h) - f(x-h)) / (2h)` | | Smoothstep | `x * x * (3 - 2 * x)` | | SDF operations | `min/max/smooth_min` boolean ops | ## SwiftUI + MTKView Integration ### Architecture Pattern ``` MetalView (UIViewRepresentable) └── Coordinator = Renderer (MTKViewDelegate) ├── MTLDevice ├── MTLCommandQueue ├── MTLRenderPipelineState └── MTLBuffer (vertices, uniforms) ``` ### Uniform Alignment Rules | Swift Type | Metal Type | Alignment | |------------|------------|-----------| | `Float` | `float` | 4 bytes | | `SIMD2` | `float2` | 8 bytes | | `SIMD3` | `float3` | **16 bytes** | | `SIMD4` | `float4` | 16 bytes | **Key**: `float3` aligns to 16 bytes. Use `MemoryLayout.size` to verify. ## Command Line Tools | Command | Purpose | |---------|---------| | `xcrun metal -c shader.metal -o shader.air` | Compile to AIR | | `xcrun metallib shader.air -o shader.metallib` | Link to metallib | | `xcrun metal shader.metal -o shader.metallib` | One-step compile & link | | `xcrun metal -Weverything -c shader.metal` | Syntax check | | `xcrun metal-objdump --disassemble shader.metallib` | Disassemble | ## GPU Debugging ### Xcode Workflow 1. **GPU Capture**: ⌘⇧⌥G 2. **Shader Profiler**: Select draw call → View Shader 3. **Memory Viewer**: Inspect buffer/texture 4. **Performance HUD**: Enable in device options ### Key Metrics | Metric | Healthy Value | Low Value Cause | |--------|---------------|-----------------| | GPU Occupancy | > 80% | Memory bandwidth bottleneck | | ALU Utilization | > 60% | Waiting on memory | | Bandwidth | As low as possible | TBDR should minimize store | ### Debug Utility Functions | Function | Purpose | |----------|---------| | heatmap | Value visualization (blue→green→red) | | debugNaN | NaN/Inf detection (magenta marker) | | visualizeDepth | Linearized depth visualization | ## Performance Optimization Checklist ### Data Types - [ ] Default `half`, `float` only for position/depth ### Memory Management - [ ] Constants in constant address space - [ ] Use `.storageModeShared` - [ ] Leverage tile memory (TBDR free reads) - [ ] Avoid unnecessary render target stores ### Branch Optimization - [ ] Function constants to eliminate branches - [ ] Fixed loop bounds (GPU unrolling) ### Rendering Tips - [ ] Fullscreen quad with 4 vertex triangle strip - [ ] Procedural textures to avoid sampling bandwidth - [ ] `[[early_fragment_tests]]` for early depth test - [ ] `setFragmentBytes` for small data ### Compute Optimization - [ ] Vectorize (SIMD) - [ ] Reduce register pressure --- *Metal, Apple Silicon, and Xcode are trademarks of Apple Inc.*