AI Virtual Characters for VR/AR

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
AI Virtual Characters for VR/AR
Complex
~2-4 weeks
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1218
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    853
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1047
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    825

AI Characters for VR/AR

Static NPCs in VR/AR applications are a bottleneck for any immersive experience. Users press a trigger, the character says a pre-recorded phrase from 5 variants, the dialogue ends. AI characters conduct real conversations: they understand scene context, remember previous interactions, adapt behavior to users, and manage animations in real-time.

AI Character Architecture

[STT] User voice → text (Whisper)
         ↓
[Context Manager] History + scene state + character personality
         ↓
[LLM] GPT-4o / Claude 3.5 → response text + action commands
         ↓
[TTS] ElevenLabs → audio stream
         ↓
[Animation Controller] Unity/Unreal → lip sync + gestures + emotions
import asyncio
from openai import AsyncOpenAI
from dataclasses import dataclass, field
import json

@dataclass
class CharacterState:
    character_id: str
    name: str
    personality: str       # system prompt with character traits
    scene_context: dict    # current VR scene state
    history: list = field(default_factory=list)
    emotional_state: str = "neutral"
    relationship_score: float = 0.5  # 0=hostile, 1=friendly

class VRCharacterEngine:
    ACTION_SCHEMA = {
        "type": "json_schema",
        "json_schema": {
            "name": "character_response",
            "schema": {
                "type": "object",
                "properties": {
                    "speech": {"type": "string"},
                    "emotion": {"type": "string",
                                "enum": ["neutral", "happy", "angry", "scared",
                                         "surprised", "sad", "suspicious"]},
                    "animation": {"type": "string",
                                  "enum": ["idle", "walk_towards", "walk_away",
                                           "point", "nod", "shake_head",
                                           "hand_gesture", "look_around"]},
                    "scene_action": {"type": "string",
                                     "description": "Scene action: open_door, pick_up_item, etc."},
                    "relationship_delta": {"type": "number",
                                           "description": "Change in relationship_score [-0.2, 0.2]"}
                },
                "required": ["speech", "emotion", "animation"]
            }
        }
    }

    def __init__(self):
        self.client = AsyncOpenAI()

    async def process_interaction(
        self,
        user_input: str,
        state: CharacterState
    ) -> dict:
        messages = [
            {"role": "system", "content": self._build_system_prompt(state)},
            *state.history[-10:],  # last 5 exchanges
            {"role": "user", "content": user_input}
        ]

        response = await self.client.chat.completions.create(
            model="gpt-4o-mini",  # mini is sufficient, latency is critical
            messages=messages,
            response_format=self.ACTION_SCHEMA,
            max_tokens=300,
            temperature=0.7
        )

        action = json.loads(response.choices[0].message.content)

        # Update character state
        state.emotional_state = action["emotion"]
        state.relationship_score = max(0, min(1,
            state.relationship_score + action.get("relationship_delta", 0)
        ))
        state.history.append({"role": "user", "content": user_input})
        state.history.append({"role": "assistant", "content": action["speech"]})

        return action

Lip Sync and Animation Synchronization

// Unity: lip sync synchronization with ElevenLabs audio stream
using OVRLipSync;
using UnityEngine;

public class AICharacterAnimator : MonoBehaviour
{
    private OVRLipSyncContext lipSyncContext;
    private Animator animator;
    private AudioSource audioSource;

    public async void PlayCharacterResponse(string speechText, string emotion, string animation)
    {
        // 1. Request audio from TTS service
        byte[] audioData = await TTSService.Synthesize(speechText, voiceId: "character_voice");

        // 2. Set emotion via Blend Shapes
        SetEmotionBlendShape(emotion);

        // 3. Start body animation
        animator.SetTrigger(animation);

        // 4. Play audio with lip sync
        AudioClip clip = AudioService.BytesToClip(audioData);
        audioSource.clip = clip;
        audioSource.Play();

        // OVRLipSync automatically synchronizes lips with audio
        lipSyncContext.ProcessAudioSamplesRaw(audioData, 0);
    }

    private void SetEmotionBlendShape(string emotion)
    {
        var face = GetComponent<SkinnedMeshRenderer>();
        // Reset all emotions
        for (int i = 0; i < face.sharedMesh.blendShapeCount; i++)
            face.SetBlendShapeWeight(i, 0);

        // Set the needed emotion
        int shapeIndex = face.sharedMesh.GetBlendShapeIndex($"emotion_{emotion}");
        if (shapeIndex >= 0)
            face.SetBlendShapeWeight(shapeIndex, 100f);
    }
}

Latency: The Main VR Character Challenge

In VR, a gap > 800 ms between user speech and character response breaks immersion. Pipeline optimization:

Step Without optimization With optimization
STT (Whisper large) 800–1200 ms 200–400 ms (Whisper medium + streaming)
LLM (GPT-4o) 1000–2000 ms 400–700 ms (GPT-4o-mini + short context)
TTS (ElevenLabs) 600–1000 ms 200–400 ms (streaming TTS)
Total 2400–4200 ms 800–1500 ms

Solution: run TTS in parallel immediately after receiving the first tokens from LLM (streaming), begin audio playback before synthesis is complete.

Case study: VR sales training simulator. 4 characters with different personalities (aggressive client, loyal client, skeptic, neutral). Average latency after optimization: 920 ms. Realism assessment (survey of 50 users): 4.1/5 vs 2.3/5 for scripted NPCs.

Timeframe: one AI character with basic animations: 3–5 weeks; complete training simulator with multiple characters and analytics: 2–3 months.