Developing Post-Processing Shaders for Games
The final frame in game rendering is not what comes out of the geometric pipeline. Between the G-Buffer and the screen sits a post-processing stack, and it's this stack that determines whether the image has character or not. Bloom with the wrong threshold blurs highlights into white mush. Chromatic aberration with a fixed offset looks like an artifact rather than stylistic choice. A custom depth of field shader that doesn't account for CoC (circle of confusion) correctly produces characteristic "muddiness" at the foreground boundaries.
These are not abstract problems—they're what distinguishes a project with a proper visual stack from one where "something feels off, but it's not clear what."
Where Standard Post-Processing Stacks Break Down
Both Unity URP/HDRP and Unreal Engine provide Volume Framework for post-processing—but each has zones where standard effects don't cover the needs of a specific visual style.
In URP, the most common pain point is the lack of native support for custom passes in Renderer Feature without deep understanding of ScriptableRenderPass and execution order in RenderPassEvent. Developers insert a custom Blit into AfterRenderingPostProcessing and get an effect that overlays the URP stack instead of integrating within it—as a result, TAA and FXAA either don't work with the custom material or create duplicated artifacts.
In HDRP, the situation is different: Custom Pass works through CustomPassVolume, and if the shader uses _CameraDepthTexture without explicit declaration in TEXTURE2D + SAMPLER under HDRP macros, it simply renders a black screen on some platforms—without warnings in the console.
Unreal specifics are different. There, post-processing via Material with Post Process domain works quickly until the first collision with r.PostProcessAAQuality and how TAA interacts with custom SceneTexture nodes. If a shader reads PostProcessInput0 without accounting for jitter offset, moving cameras produce "ghosting"—ghosting that looks like a renderer bug, but the cause is in the shader.
What Actually Gets Done Within This Service
Developing a post-processing shader is not "write HLSL." It's a cycle through several stages, each with its own tools.
Analysis of visual references and task definition. Before writing the first line of code—understand what exactly needs to be reproduced. Cel-shading with edge detection via Sobel on depth and normals differs from cel-shading via Step in lighting. These are different shaders with different performance trade-offs.
Prototyping in Shader Graph / Material Editor. Shader Graph in Unity allows rapid hypothesis testing without manual compilation. For Unreal—Material Graph. The prototype is made in 1–2 days, after which a decision is made: leave it in the nodal system or rewrite in HLSL/Custom HLSL node for optimization.
Writing and optimizing HLSL/GLSL. The final shader is written with platform in mind. Mobile target means mediump float, minimum texture samples, no discard. PC/console allows fullscreen passes with multiple Blits and compute-shaders for separation passes in Gaussian blur.
Pipeline integration. ScriptableRenderPass for URP, CustomPassVolume for HDRP, Post Process Material for Unreal. Including setting Injection Point, execution order, and compatibility with Motion Vectors if TAA is needed.
Profiling. RenderDoc for frame capture, Unity Frame Debugger / Unreal RenderDoc plugin for pass analysis. GPU Profiler to measure specific pass time. On mobile—Snapdragon Profiler or Mali Graphics Debugger.
On one project—a mobile RPG in URP—the client wanted a custom "ink outline" effect over geometry. The first prototype via ScriptableRenderPass with Roberts Cross on depth gave 4ms on Adreno 650 at 1080p. After rewriting to single-pass with simplified kernel and using _CameraDepthNormalsTexture instead of two separate textures—1.1ms. The difference is not in the algorithm, but in the number of texture fetches.
Shader Development Stages
| Stage | What's Done | Approximate Timeline |
|---|---|---|
| Reference analysis and specification | Visual style analysis, algorithm determination | 1–2 days |
| Shader Graph / Material Graph prototype | Rapid hypothesis testing, coordination | 1–3 days |
| HLSL implementation | Shader writing for target platform | 2–7 days |
| Pipeline integration | ScriptableRenderPass / CustomPassVolume / PP Material | 1–3 days |
| Profiling and optimization | Frame capture, GPU timing, platform optimization | 1–4 days |
| Documentation and handoff | Code comments, setup instructions | 0.5–1 day |
Complex effects with multiple passes (such as volumetric fog via ray marching in post-process or screen-space subsurface scattering) can take 3–4 weeks including iterations.
Common Mistakes in Self-Directed Development
Most problems are not in the algorithm, but in integration.
Incorrect RenderPassEvent. Inserting a custom pass into BeforeRenderingPostProcessing without understanding that URP hasn't applied Color Grading at this stage—as a result, the LUT layers over the custom effect instead of under it.
Ignoring camera stacking. In URP with Overlay Camera, a custom ScriptableRenderPass attached to Base Camera doesn't execute for Overlay—you need to register the pass in both Renderer Assets or use Universal Renderer with the correct renderingLayerMask flag.
Hardcoded resolution. float2(1.0/1920.0, 1.0/1080.0) in the shader instead of _ScreenParams.zw - 1—on devices with non-standard resolution or with dynamic resolution, the effect breaks.
HDR loss. If the effect is applied after tonemap but calculated for linear HDR buffer—colors will be incorrect. It's important to explicitly define the application point in the pipeline.
Cost is calculated after analyzing the technical specification: platform, target render pipeline, algorithm complexity, and performance requirements.





