Performance Optimization
The project runs flawlessly on developers' flagship devices. On a mid-range Android from 2020—20 fps and overheating after 5 minutes. On iPhone 11—stable 60, but on iPhone XR—frame drops in heavy scenes. This is a standard situation when optimization isn't baked into the architecture from the start, but done "later"—which is always more expensive and painful.
Profiling Tools
Before optimizing—measure. Optimization without profiling is guessing.
| Tool | Purpose |
|---|---|
| Unity Profiler | CPU/GPU time by system, GC allocations, audio |
| Frame Debugger | Inspect each draw call in a frame |
| Memory Profiler | Memory snapshot, asset dependency graph |
| RenderDoc | Deep GPU state analysis, relevant for PC/Console |
| Android GPU Inspector | GPU profiling on real Android device |
| Xcode Instruments | GPU + memory on iOS (Metal Performance HUD) |
| Snapdragon Profiler | Qualcomm GPU—detailed shader stats |
Rule one: profile on target hardware, not in the editor. Editor adds significant overhead—Play Mode numbers aren't representative of builds.
Rule two: look at GPU and CPU time separately. Bottleneck can be CPU (too many draw calls, heavy logic), GPU (complex shaders, overdraw, fillrate), or memory (GC allocations, texture streaming). Treatment differs.
Draw Call Optimization: Batching in Detail
A draw call is a CPU command to GPU: "render this." Each call has overhead on the CPU regardless of geometry complexity. On mobile, 200–300 draw calls per frame is the threshold; beyond that, problems emerge. Goal: minimize draw calls by merging geometry with identical materials.
Static Batching
Combines static (immobile) meshes into one large mesh at build time or scene start. Requirements:
-
Staticflag on object (or at leastBatching Static) - Identical material
Pros: zero CPU overhead at runtime, works on all platforms. Cons: increases memory consumption (merged mesh stored separately) and scene load time. For scenes with thousands of static objects—be cautious with memory; check via Memory Profiler.
Dynamic Batching
Merges meshes at runtime each frame. Requirements are stricter:
- Fewer than 900 vertex attributes per mesh (Unity limit)
- Identical material
- Identical scale (or non-negative scale on one axis)
In practice, Dynamic Batching works only for small objects (particles, UI, small debris). For characters and environments, usually doesn't fit due to vertex limits. In URP, Dynamic Batching is disabled by default—replaced by SRP Batcher.
SRP Batcher
SRP Batcher isn't classic geometry batching—it's CPU overhead optimization when preparing draw calls. Instead of reloading shader uniform data each frame (matrices, material properties), SRP Batcher caches them in GPU memory and updates only changes.
Result: draw call count stays same, but each takes less CPU time. In scenes with many unique materials, SRP Batcher gives noticeable gains—sometimes 2–3x on render CPU time.
Requirement: shader must be SRP Batcher compatible—declare all per-object properties in UnityPerDraw CBUFFER. Standard URP Lit/Unlit shaders are compatible. Custom shaders—check in Inspector: shows SRP Batcher compatible: Yes/No.
GPU Instancing
For many copies of the same mesh with one material (trees, grass, same-type NPCs, projectiles). GPU Instancing sends one draw call with an array of per-instance data (transform matrices, color)—GPU renders all copies at once.
Enabled on material: Enable GPU Instancing checkbox. In shader—support via UNITY_INSTANCING_BUFFER (standard URP shaders support). Limitation: all instances in one batch must have identical material and mesh.
Graphics.DrawMeshInstanced / Graphics.DrawMeshInstancedIndirect — for procedural rendering without GameObject overhead (grass, particles, procedural content). Indirect lets you form the instance list on GPU via Compute Shader.
Comparison of approaches:
| Method | Best For | Limitation |
|---|---|---|
| Static Batching | Static environment | Memory |
| SRP Batcher | Many unique materials | CPU overhead only |
| GPU Instancing | Many copies of one object | Identical material/mesh |
| Dynamic Batching | Small objects in URP | Vertex limit |
Memory Optimization for Mobile
Mobile platforms have harsh RAM constraints. iOS kills apps when memory is exceeded without warning. Android does the same with onLowMemory callback. Target budgets:
- iOS: < 1 GB for modern devices, < 512 MB for iPhone 8/X support
- Android: < 800 MB for broad compatibility, accounting for Android itself using ~400–600 MB
Addressables and Asset Bundles
Loading everything at startup is unacceptable for large projects. Addressables (wrapper over Asset Bundles) — system for addressable asynchronous asset loading.
Key principles:
Explicit unloading: Addressables.ReleaseInstance / Addressables.Release. Addressables don't auto-unload on object destruction. Typical mistake: Addressables.InstantiateAsync in a loop without Release—memory grows until crash.
Reference counting: asset unloads only when all handles are freed. Architectural pattern: a service/manager holds the asset handle, releases when switching scenes or on explicit call.
Groups and Bundle Strategy: organize assets by load logic:
-
Pack Together— all group assets in one bundle (one request load) -
Pack Separately— each asset in its own bundle (granular loading) -
Pack Together by Label— by tags (flexible)
For levels: all one level's assets in one bundle. Shared assets (common UI textures, fonts) — separate group with Prevent Updates for stable cache.
Texture Memory
Textures consume most memory in most games. Analyze via Memory Profiler: tab All Of Memory → Texture2D—immediately see heaviest textures.
Practical measures:
-
Mipmap: for 3D textures—enable; for UI—disable (
Advanced > Generate Mip Maps: false). UI textures render at fixed screen size; mipmap wastes memory -
Max Size: check if
Max Sizein Import Settings isn't too high. 4096 for mobile icon—typical mistake - Cross-reference issue: textures referenced by unused Materials in memory—Memory Profiler shows reference chain
-
Streaming Mipmaps: for open world—enable
Texture Streamingin Quality Settings. Loads mip levels as camera approaches
GC Allocations
C# garbage collector in Unity is stop-the-world. Heavy heap allocation per frame triggers GC pause, causing visible freezes. Goal: zero allocations in hot path (Update, FixedUpdate, render).
Typical allocation sources found in Profiler:
-
stringconcatenation in Update ("Score: " + score→StringBuilderorstring.Format) - LINQ in hot path (
Where,Select,ToList→ manual loops with pre-allocated lists) -
GetComponent<T>()every frame → cache inAwake/Start -
new Vector3()and other value types in certain patterns—check Profiler - Boxing value types when passing to
objectparameters
LOD and Culling
LOD Group — switches to simplified geometry when objects are far from camera. Standard for 3D environments: LOD0 (100%), LOD1 (30–50% triangles), LOD2 (10–15%), Culled (invisible). For mobile, set Culled threshold more aggressively—render less per frame.
Occlusion Culling — Unity doesn't render objects behind walls. Requires baked occlusion data (Window > Rendering > Occlusion Culling > Bake). For open spaces, minimal effect; for indoor scenes—significant.
Frustum Culling happens automatically—objects outside FOV don't render. But draw call check still happens. For scenes with thousands of objects—custom spatial partitioning (Quadtree, Octree) to speed culling tests.
VR Optimization
VR is a separate class of problems. Frame rate 72/90 Hz is inviolable—motion sickness otherwise. Additionally to standard methods:
- Single Pass Instanced Rendering — render both eyes in one pass (see VR section)
- Fixed Foveated Rendering (Quest) — reduced resolution at periphery
- Late Latching (Quest 3) — update controller position as late as possible before render, reduces perceived latency
- Dynamic Resolution in URP/HDRP — auto-reduce render resolution on fps drop
For Quest, profile via OVR Metrics Tool — shows CPU/GPU time in headset at runtime, more convenient than USB profiling.





