AI Context Window and Dialog History Management for Mobile App

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

Development and support of all types of mobile applications:

Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1All 1735 services
AI Context Window and Dialog History Management for Mobile App
Medium
~2-3 days
Frequently Asked Questions

Our competencies:

Development stages

Latest works

  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    792
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    671
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1097
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    969
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    914
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    495

Implementing Context Window and Dialog History Management in a Mobile Application

Context is what transforms scattered questions into coherent conversation. Sending entire conversation history in each request—most naive solution. Works until first context overflow or API bill complaint. History management—separate engineering task requiring early design.

How Context Grows and Why It's a Problem

Each exchange adds tokens: user request + model response. Average message 50–100 tokens, 20 pairs—already 2000–4000 tokens just on history, plus system prompt. At GPT-4o $5 per 1M input tokens—trivial. At 1000 active users with 50 messages daily—$250/day just on history that could be more compact.

Second problem: different models have different limits. GPT-4o—128K, Claude—200K, YandexGPT—8K. App that worked fine with GPT-4o breaks switching to another model.

Three History Management Strategies

1. Sliding Window

Simplest approach: keep last N messages, discard earlier. Fast, predictable. Minus: model "forgets" conversation start—user name, agreements from early messages.

func buildMessages(history: [Message], systemPrompt: String, maxTokens: Int = 3000) -> [Message] {
    var result: [Message] = []
    var tokenCount = countTokens(systemPrompt)

    // Go from end of history
    for message in history.reversed() {
        let msgTokens = countTokens(message.content)
        if tokenCount + msgTokens > maxTokens { break }
        result.insert(message, at: 0)
        tokenCount += msgTokens
    }
    return result
}

2. Summarization

When history exceeds threshold—send accumulated messages for summarization via cheaper model (gpt-4o-mini, claude-haiku, mistral-small). Get summary, save as system message or special assistant block, remove summarized messages from active history.

Problem: specific facts are lost ("user said allergic to penicillin"). For medical, legal, financial assistants—summarization without explicit fact preservation is risky.

3. Hybrid with Memory

Most reliable for long-term assistants:

  • Short-term memory — last 10–15 messages, always in context
  • Long-term memory — structured facts about user and conversation, stored separately
  • Semantic search — fetch relevant facts from long-term memory via embeddings on each request

Long-term memory updates via additional call: after each model response, ask model to extract facts for memory ("What new facts about the user can be extracted from this dialog?").

Storing History on Mobile

SQLite—standard. Structure:

CREATE TABLE conversations (
    id TEXT PRIMARY KEY,
    created_at INTEGER,
    title TEXT,
    model TEXT,
    summary TEXT  -- summarization of old messages
);

CREATE TABLE messages (
    id TEXT PRIMARY KEY,
    conversation_id TEXT REFERENCES conversations(id),
    role TEXT CHECK(role IN ('user', 'assistant', 'system')),
    content TEXT,
    token_count INTEGER,
    created_at INTEGER
);

CREATE INDEX idx_messages_conversation ON messages(conversation_id, created_at);

token_count calculated on save—not on every load. Important for performance with long histories.

Token Counting on Mobile

Accurate counting requires tokenizer for specific model. On server—tiktoken for OpenAI, tokenizers from HuggingFace for others. On mobile, usually heuristics:

  • English text: ~4 characters ≈ 1 token
  • Russian text: ~2–2.5 characters ≈ 1 token (Cyrillic encodes to more tokens)
  • Code: ~3 characters ≈ 1 token

For responsible counting (billing, limits)—server-side validation.

UI: Displaying History

Message list—UITableView in reverse order (new at bottom) or LazyColumn in Compose with reverseLayout = true. During streaming, last message updates in place without scroll jump.

Context window indicator: show user how much "memory" is used—visual bar or token counter. Not essential, but apps that add it get fewer complaints about assistant "forgetfulness."

Timeline Estimates

Sliding window with SQLite storage—3–4 days. Hybrid system with summarization and long-term memory—1.5–2.5 weeks.