Deepgram Real-Time Transcription Integration for Mobile App

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
Deepgram Real-Time Transcription Integration for Mobile App
Medium
~3-5 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Deepgram Integration for Real-Time Transcription in Mobile Applications

Deepgram Nova-2 is the only provider with truly low latency on streaming: median around 300 ms from phrase end to text. Whisper can't do this in principle — it's synchronous. If task is "user speaks — text appears on screen" with latency under one second, this is Deepgram.

Connection Protocol

Deepgram works via WebSocket. Endpoint:

wss://api.deepgram.com/v1/listen?model=nova-2&language=ru&encoding=linear16&sample_rate=16000&channels=1&interim_results=true

Parameters critical: encoding=linear16 means raw PCM 16-bit little-endian. Any other format without explicit codec — risk of 1008 Policy Violation. interim_results=true enables partial results — these create real-time feel.

iOS: AVAudioEngine + URLSessionWebSocketTask

class DeepgramStreamer {
    private var audioEngine = AVAudioEngine()
    private var webSocket: URLSessionWebSocketTask?

    func start() throws {
        let session = URLSession(configuration: .default)
        var request = URLRequest(url: URL(string: "wss://api.deepgram.com/v1/listen?model=nova-2&language=ru&encoding=linear16&sample_rate=16000&channels=1&interim_results=true")!)
        request.setValue("Token \(apiKey)", forHTTPHeaderField: "Authorization")
        webSocket = session.webSocketTask(with: request)
        webSocket?.resume()

        receiveLoop()

        let inputNode = audioEngine.inputNode
        let format = AVAudioFormat(commonFormat: .pcmFormatInt16, sampleRate: 16000, channels: 1, interleaved: false)!
        inputNode.installTap(onBus: 0, bufferSize: 4096, format: format) { buffer, _ in
            guard let channelData = buffer.int16ChannelData else { return }
            let frameLength = Int(buffer.frameLength)
            let data = Data(bytes: channelData[0], count: frameLength * 2)
            self.webSocket?.send(.data(data)) { _ in }
        }
        try audioEngine.start()
    }

    private func receiveLoop() {
        webSocket?.receive { [weak self] result in
            if case .success(let message) = result, case .string(let text) = message {
                // Decode Deepgram JSON response
                self?.handleTranscript(text)
            }
            self?.receiveLoop()
        }
    }
}

Important detail: AVAudioEngine.inputNode on iOS 16+ requires explicit microphone permission via AVAudioSession.sharedInstance().requestRecordPermission. And mandatory AVAudioSession.setCategory(.record, mode: .measurement).measurement mode disables AEC and AGC, which can distort signal for transcription.

Android: AudioRecord + OkHttp WebSocket

class DeepgramStreamer(private val apiKey: String) {
    private val client = OkHttpClient()
    private var webSocket: WebSocket? = null
    private var audioRecord: AudioRecord? = null

    fun start(onTranscript: (String, Boolean) -> Unit) {
        val request = Request.Builder()
            .url("wss://api.deepgram.com/v1/listen?model=nova-2&language=ru&encoding=linear16&sample_rate=16000&channels=1&interim_results=true")
            .header("Authorization", "Token $apiKey")
            .build()

        webSocket = client.newWebSocket(request, object : WebSocketListener() {
            override fun onMessage(webSocket: WebSocket, text: String) {
                val json = JSONObject(text)
                val channel = json.getJSONObject("channel")
                val alternatives = channel.getJSONArray("alternatives")
                val transcript = alternatives.getJSONObject(0).getString("transcript")
                val isFinal = json.getBoolean("is_final")
                if (transcript.isNotEmpty()) onTranscript(transcript, isFinal)
            }
        })

        val bufferSize = AudioRecord.getMinBufferSize(16000, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT)
        audioRecord = AudioRecord(MediaRecorder.AudioSource.MIC, 16000, AudioFormat.CHANNEL_IN_MONO, AudioFormat.ENCODING_PCM_16BIT, bufferSize)
        audioRecord?.startRecording()

        Thread {
            val buffer = ShortArray(bufferSize / 2)
            while (audioRecord?.recordingState == AudioRecord.RECORDSTATE_RECORDING) {
                val read = audioRecord!!.read(buffer, 0, buffer.size)
                if (read > 0) {
                    val byteBuffer = ByteBuffer.allocate(read * 2).order(ByteOrder.LITTLE_ENDIAN)
                    buffer.take(read).forEach { byteBuffer.putShort(it) }
                    webSocket?.send(byteBuffer.array().toByteString())
                }
            }
        }.start()
    }
}

ByteOrder.LITTLE_ENDIAN — mandatory. Deepgram expects LE PCM. If sending BE, transcription works but with noticeably worse quality.

What to Do with interim_results

Deepgram returns two types: is_final: false (interim) and is_final: true (final). Right UI pattern:

  • Interim display in gray or italics — user sees recognition happening
  • On is_final: true replace all previous interim of this utterance with final text
  • speech_final: true means end of pause — good moment to start phrase processing

Common mistake — accumulate all interim as separate lines. Causes duplication. Must store current interim-buffer and update in-place.

Nova-2 Parameters Affecting Quality

utterance_end_ms: 1000 — Deepgram finalizes utterance after 1 second silence itself. Useful for dictation without explicit "stop" command.

diarize: true — speaker separation, adds speaker to each word.

punctuate: true — auto-punctuation. Without it text comes without periods and commas.

smart_format: true — formats numbers, dates, phones. "twenty-fifth March" → "25 March".

Timeline

Basic integration WebSocket + AudioRecord/AVAudioEngine + text output — 4–7 days. Adding diarization, network switch handling (reconnect), background mode, result export — 8–14 days.