tags: [vr-ar]
Voice chat implementation in VR games
Voice in VR is not just "press to talk". It's spatial audio from the avatar's location. Voice must fade with distance, reflect from geometry, and sound from the right if the avatar stands to the right. Without this, multiplayer VR experience loses half its presence.
Two main VR voice SDKs — Vivox (Unity Gaming Services, integrates well with Lobby and Relay) and Dissonance Voice Chat (independent Unity package, works with Photon, Mirror, NGO). Each has its own work model, and choice depends on the rest of the network stack.
Dissonance: integration with Photon Fusion
Dissonance works over existing network transport — voice packets go through the same channel as game data. For Photon Fusion there's ready integration DissonanceComms + PhotonFusionCommsNetwork. Installed on separate GameObject in scene, connected to NetworkRunner.
Critical setting often skipped: VoicePlaybackOrder. By default Dissonance queues voice with ~100 ms delay for jitter smoothing. In VR this is noticeable: avatar lips move (if there's lipsync), voice arrives later. Need to lower MinJitterBuffer to 20–30 ms — with good connection jitter is minimal, can afford smaller buffer.
Spatial audio in Dissonance: on each VoiceReceiptTrigger (component on avatar) enable Use Positional Data — Dissonance transmits source position via Unity Audio Source. Then standard Unity 3D Audio works with AudioRolloffMode.Logarithmic, MinDistance, MaxDistance. VR-specific: AudioListener is on HMD, not on Camera.main — make sure it moves with user's head.
Vivox and spatial audio: nuances
Vivox is cloud-hosted SaaS. Voice doesn't go through game server, but through Unity servers (Epic Voice Service). This reduces load on game infrastructure, but adds dependency on external service and uncontrolled latency.
For spatial audio in Vivox use VivoxUnity.IAudioSource3D — SDK transmits position and orientation to cloud, and server applies HRTF (Head-Related Transfer Function) processing on receiver side. More computationally expensive, but 3D positioning quality better than Unity Audio Source with linear rolloff.
Vivox problem on standalone Quest: SDK requires active internet connection and adds 50–150 ms overhead compared to P2P or local Photon transport. For games where voice communication is key mechanic (negotiations, team commands), this is noticeable.
Noise suppression and codecs
Quest users speak into built-in headset microphone in home environment — household noise, kids, TV. Without noise suppression voice in game is muddy.
Dissonance supports WebRTC Noise Suppressor (VAD + NS) — connected via DissonanceComms.MicrophoneCapture. WebRTC NS handles steady noise (fridge hum, background noise) well, less so with sharp sounds.
Voice codec: Opus is de facto standard. Dissonance uses Opus by default with 16–32 kbps bitrate — sufficient for clear speech. Increase to 64 kbps doesn't give significant quality improvement for voice. Vivox also uses Opus, but bitrate and parameters aren't manually configurable.
Voice zoning: teams and whisper
Games with multiple teams or large spaces need zoning: player hears only those nearby or in their team. Dissonance implements this via Rooms — each player subscribes to rooms and speaks in specific one.
For "whisper" mechanic (heard only close) and "shout" (heard across map), make three rooms: Proximity (5m radius, automatic by distance), Team (only own team), Broadcast (everyone). Player subscribes to all three as listener, but speaks only in one, switching active via UI or gesture.
Switching via gesture — VR-specific. No keyboard. "Whisper" — hand to mouth (proximity detection via distance from HMD to right controller < 15 cm). "Radio" — press icon on forearm avatar via ray interactor.
Timelines and evaluation
| Integration option | Estimated timeline |
|---|---|
| Dissonance + Photon Fusion, basic spatial audio | 3–7 days |
| Vivox + Unity Gaming Services, zoning | 1–2 weeks |
| Custom zoning + gesture control + lipsync | 2–4 weeks |
Cost calculated after analyzing current network stack and voice chat functionality requirements.





