Development of Social Space in Metaverse
Social space in a metaverse is a virtual place for people to interact: conferences, hangouts, concerts, business meetings. Unlike game metaverses, the focus here is on human communication, not gameplay. Technically, it is a real-time multiplayer 3D environment with voice/video communication and tools for social interaction.
Technical Architecture
Server Topology
For social space, physical proximity of servers to users is important — high latency in voice communication is unacceptable:
Global CDN (static assets, 3D models)
│
Regional Game Servers (AWS eu-west-1, us-east-1, ap-southeast-1)
│
├── Room Manager (create/destroy rooms)
├── State Sync Service (avatar positions, objects)
├── Voice Server (SFU: Selective Forwarding Unit)
└── Chat Service (text, reactions, file sharing)
Spatial Audio and Proximity Chat
In real space, sound is quieter at distance. Metaverse should imitate this — otherwise in a room with 50 people everyone talks at once and it's unclear who is addressing whom.
class SpatialAudioManager {
private audioContext: AudioContext;
private panners: Map<string, PannerNode> = new Map();
addParticipant(userId: string, stream: MediaStream) {
const source = this.audioContext.createMediaStreamSource(stream);
const panner = this.audioContext.createPanner();
// Web Audio API spatial settings
panner.panningModel = 'HRTF'; // Head-Related Transfer Function - binaural sound
panner.distanceModel = 'inverse';
panner.refDistance = 3; // full volume up to 3 meters
panner.maxDistance = 20; // not heard beyond 20 meters
panner.rolloffFactor = 2;
source.connect(panner);
panner.connect(this.audioContext.destination);
this.panners.set(userId, panner);
}
updateParticipantPosition(userId: string, position: Vector3) {
const panner = this.panners.get(userId);
if (panner) {
panner.positionX.value = position.x;
panner.positionY.value = position.y;
panner.positionZ.value = position.z;
}
}
updateListenerPosition(position: Vector3, orientation: Quaternion) {
const listener = this.audioContext.listener;
listener.positionX.value = position.x;
listener.positionY.value = position.y;
listener.positionZ.value = position.z;
// Direction of listener's gaze
const forward = new THREE.Vector3(0, 0, -1).applyQuaternion(
new THREE.Quaternion(orientation.x, orientation.y, orientation.z, orientation.w)
);
listener.forwardX.value = forward.x;
listener.forwardY.value = forward.y;
listener.forwardZ.value = forward.z;
}
}
WebRTC SFU Architecture
For scalable voice with dozens of participants, an SFU (Selective Forwarding Unit) is needed. Each client sends a stream once to SFU, SFU forwards to needed recipients:
[Participant A] ──send──► [SFU Server] ──forward──► [Participant B]
[Participant B] ──send──► ──forward──► [Participant C]
[Participant C] ──send──► ──forward──► [Participant A]
Recommended open-source SFU: mediasoup (Node.js, high performance), livekit (Go, cloud-native), janus (C, mature project).
// LiveKit client integration
import { Room, RoomEvent, Track } from 'livekit-client';
class MetaverseVoiceClient {
private room: Room;
async connect(roomToken: string) {
this.room = new Room({
adaptiveStream: true, // auto-quality by network
dynacast: true, // disables video streams outside viewport
});
this.room.on(RoomEvent.ParticipantConnected, (participant) => {
participant.on('trackSubscribed', (track) => {
if (track.kind === Track.Kind.Audio) {
this.spatialAudio.addParticipant(participant.identity, track.mediaStream!);
}
});
});
await this.room.connect('wss://your-livekit-server.com', roomToken);
// Publish microphone
await this.room.localParticipant.enableMicrophone();
}
}
State Synchronization
Avatar positions must be synchronized in real time for all room participants:
import asyncio
from dataclasses import dataclass
import msgpack # binary serialization, faster than JSON
@dataclass
class AvatarState:
user_id: str
position: tuple # (x, y, z)
rotation: tuple # quaternion (x, y, z, w)
animation: str # 'idle', 'walk', 'wave', etc.
timestamp: float
class StateSync:
def __init__(self, room_id: str):
self.room_id = room_id
self.states: dict[str, AvatarState] = {}
self.clients: set = set()
async def update_position(self, user_id: str, state_data: dict):
self.states[user_id] = AvatarState(**state_data)
await self.broadcast_delta(user_id, state_data)
async def broadcast_delta(self, updated_user_id: str, delta: dict):
"""Send only the change, not full state"""
message = msgpack.packb({
'type': 'position_update',
'user_id': updated_user_id,
'state': delta
})
# Send to all except sender
tasks = [
client.send(message)
for client_id, client in self.clients.items()
if client_id != updated_user_id
]
await asyncio.gather(*tasks, return_exceptions=True)
Client-side interpolation: due to network latency, positions arrive discretely. Client interpolates between received states for smoothness:
class AvatarInterpolator {
private stateBuffer: Array<{time: number, state: AvatarState}> = [];
private INTERPOLATION_DELAY_MS = 100; // buffer for smoothing
update(state: AvatarState) {
this.stateBuffer.push({time: Date.now(), state});
// Keep only last 10 states
if (this.stateBuffer.length > 10) this.stateBuffer.shift();
}
getInterpolatedState(): AvatarState | null {
const renderTime = Date.now() - this.INTERPOLATION_DELAY_MS;
const before = this.stateBuffer.filter(s => s.time <= renderTime).at(-1);
const after = this.stateBuffer.find(s => s.time > renderTime);
if (!before || !after) return before?.state || null;
const t = (renderTime - before.time) / (after.time - before.time);
return this.lerp(before.state, after.state, t);
}
}
Social Interaction Tools
Emotions and gestures: set of quick commands (wave, clap, dance). Animations play synchronously for all participants.
Collaborative whiteboard: shared drawing board, supports CRDT (Conflict-free Replicated Data Type) for conflict-free collaborative editing.
Presentation mode: one participant broadcasts screen/slides to everyone in the room. Implemented via WebRTC screen sharing + SFU forward.
Spatial objects: placing 3D objects, NFT artwork, embedded video/audio in space. Objects persisted in database and loaded on room entry.
Access Control
interface RoomConfig {
id: string;
name: string;
maxParticipants: number;
accessControl: {
type: 'public' | 'token_gated' | 'invite_only' | 'nft_holders';
nftContract?: string; // for token-gated
minNFTBalance?: number; // how many NFTs to have
inviteList?: string[]; // wallet addresses
};
spatialAudio: boolean;
recordingEnabled: boolean;
}
async function checkRoomAccess(userWallet: string, roomConfig: RoomConfig): Promise<boolean> {
if (roomConfig.accessControl.type === 'public') return true;
if (roomConfig.accessControl.type === 'nft_holders') {
const balance = await nftContract.balanceOf(userWallet);
return balance >= (roomConfig.accessControl.minNFTBalance || 1);
}
if (roomConfig.accessControl.type === 'token_gated') {
const tokenBalance = await erc20Contract.balanceOf(userWallet);
return tokenBalance > 0;
}
return false;
}
Token-gated social spaces are a powerful tool for DAO community calls, holder-only events, and VIP networking. NFT becomes not just a collectible, but a key to exclusive social spaces.
Performance: WebGL Optimizations
Rendering 50+ avatars in a browser requires serious optimizations:
- LOD (Level of Detail): distant avatars rendered with fewer polygons
- Instanced rendering: identical objects rendered with one GPU draw call
- Frustum culling: don't render what's outside field of view
- Occlusion culling: don't render what's hidden behind other objects
- Asset streaming: load 3D resources as you approach them
Social space in a metaverse is technically the most complex class of real-time web applications. Proper architecture from day one saves months of rework when scaling.







