VoIP Voice Chat Implementation in VR Games

Our video game development company runs independent projects, jointly creates games with the client and provides additional operational services. Expertise of our team allows us to cover all gaming platforms and develop an amazing product that matches the customer’s vision and players preferences.
Showing 1 of 1 servicesAll 242 services
VoIP Voice Chat Implementation in VR Games
Complex
~1-2 weeks
FAQ
Our competencies
What are the stages of Game Development?
Latest works
  • image_games_mortal_motors_495_0.webp
    Game development for Mortal Motors
    663
  • image_games_a_turnbased_strategy_game_set_in_a_fantasy_setting_with_fire_and_sword_603_0.webp
    A turn-based strategy game set in a fantasy setting, With Fire and Sword
    859
  • image_games_second_team_604_0.webp
    Game development for the company Second term
    490
  • image_games_phoenix_ii_606_0.webp
    3D animation - teaser for the game Phoenix 2.
    533

tags: [vr-ar]

Voice chat implementation in VR games

Voice in VR is not just "press to talk". It's spatial audio from the avatar's location. Voice must fade with distance, reflect from geometry, and sound from the right if the avatar stands to the right. Without this, multiplayer VR experience loses half its presence.

Two main VR voice SDKs — Vivox (Unity Gaming Services, integrates well with Lobby and Relay) and Dissonance Voice Chat (independent Unity package, works with Photon, Mirror, NGO). Each has its own work model, and choice depends on the rest of the network stack.

Dissonance: integration with Photon Fusion

Dissonance works over existing network transport — voice packets go through the same channel as game data. For Photon Fusion there's ready integration DissonanceComms + PhotonFusionCommsNetwork. Installed on separate GameObject in scene, connected to NetworkRunner.

Critical setting often skipped: VoicePlaybackOrder. By default Dissonance queues voice with ~100 ms delay for jitter smoothing. In VR this is noticeable: avatar lips move (if there's lipsync), voice arrives later. Need to lower MinJitterBuffer to 20–30 ms — with good connection jitter is minimal, can afford smaller buffer.

Spatial audio in Dissonance: on each VoiceReceiptTrigger (component on avatar) enable Use Positional Data — Dissonance transmits source position via Unity Audio Source. Then standard Unity 3D Audio works with AudioRolloffMode.Logarithmic, MinDistance, MaxDistance. VR-specific: AudioListener is on HMD, not on Camera.main — make sure it moves with user's head.

Vivox and spatial audio: nuances

Vivox is cloud-hosted SaaS. Voice doesn't go through game server, but through Unity servers (Epic Voice Service). This reduces load on game infrastructure, but adds dependency on external service and uncontrolled latency.

For spatial audio in Vivox use VivoxUnity.IAudioSource3D — SDK transmits position and orientation to cloud, and server applies HRTF (Head-Related Transfer Function) processing on receiver side. More computationally expensive, but 3D positioning quality better than Unity Audio Source with linear rolloff.

Vivox problem on standalone Quest: SDK requires active internet connection and adds 50–150 ms overhead compared to P2P or local Photon transport. For games where voice communication is key mechanic (negotiations, team commands), this is noticeable.

Noise suppression and codecs

Quest users speak into built-in headset microphone in home environment — household noise, kids, TV. Without noise suppression voice in game is muddy.

Dissonance supports WebRTC Noise Suppressor (VAD + NS) — connected via DissonanceComms.MicrophoneCapture. WebRTC NS handles steady noise (fridge hum, background noise) well, less so with sharp sounds.

Voice codec: Opus is de facto standard. Dissonance uses Opus by default with 16–32 kbps bitrate — sufficient for clear speech. Increase to 64 kbps doesn't give significant quality improvement for voice. Vivox also uses Opus, but bitrate and parameters aren't manually configurable.

Voice zoning: teams and whisper

Games with multiple teams or large spaces need zoning: player hears only those nearby or in their team. Dissonance implements this via Rooms — each player subscribes to rooms and speaks in specific one.

For "whisper" mechanic (heard only close) and "shout" (heard across map), make three rooms: Proximity (5m radius, automatic by distance), Team (only own team), Broadcast (everyone). Player subscribes to all three as listener, but speaks only in one, switching active via UI or gesture.

Switching via gesture — VR-specific. No keyboard. "Whisper" — hand to mouth (proximity detection via distance from HMD to right controller < 15 cm). "Radio" — press icon on forearm avatar via ray interactor.

Timelines and evaluation

Integration option Estimated timeline
Dissonance + Photon Fusion, basic spatial audio 3–7 days
Vivox + Unity Gaming Services, zoning 1–2 weeks
Custom zoning + gesture control + lipsync 2–4 weeks

Cost calculated after analyzing current network stack and voice chat functionality requirements.