Vector Search for AI Knowledge Base in Mobile App

TRUETECH is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
Vector Search for AI Knowledge Base in Mobile App
Complex
~5 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1054
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

Implementing Vector Search for AI Knowledge Base in a Mobile Application

Vector search finds semantically similar documents, not just keyword matches. Query "how to restore access" finds article "password reset," even if word "restore" doesn't appear. Foundation of any AI search over knowledge base.

How It Works at Code Level

Each text fragment becomes vector—array of numbers (1536 or 3072 values for OpenAI, 768 for local models). Semantically similar texts give close vectors. Search—finding nearest vectors to query (Approximate Nearest Neighbor, ANN).

Practically for mobile app:

  1. User enters query
  2. Client sends query to backend
  3. Backend creates query embedding via API (OpenAI, Cohere) or local model
  4. Vector DB returns top-K nearest chunks
  5. Results passed to LLM or returned directly

Whole pipeline up to step 4 takes 50–300 ms—acceptable for mobile UX.

Vector Indices: What to Choose

pgvector—PostgreSQL extension. If already using PostgreSQL—zero additional infrastructure. Supports HNSW and IVFFlat indices.

-- HNSW index for fast ANN search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

-- Search top-5 nearest
SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
ORDER BY embedding <=> $1
LIMIT 5;

<=> is cosine distance in pgvector. For normalized vectors, cosine distance equivalent to inner product (<#>), but <=> works without normalization.

Index choice:

  • IVFFlat—builds fast, less memory, slightly less accurate
  • HNSW—best accuracy, fast search, more memory on building

For database up to 1M documents, pgvector with HNSW handles fine. Above 10M—consider Pinecone, Weaviate, Qdrant.

Metadata Filtering

Vector search without filters searches entire index. If need search only company-specific documents, department, or language—add filtering.

SELECT id, content, 1 - (embedding <=> $1) AS similarity
FROM documents
WHERE
    language = 'ru'
    AND category = 'installation'
    AND updated_at > NOW() - INTERVAL '1 year'
ORDER BY embedding <=> $1
LIMIT 10;

Important: pgvector executes filter AFTER vector search with HNSW/IVFFlat. For highly selective filters (<10% rows), bad results—need either build separate indices per subset or use partitioned HNSW.

Embeddings: Client vs Server

Generate query embedding on client (local ML model) or server. For mobile app—server preferable: embedding models weigh 80–500 MB, local inference uses resources, API key doesn't stick out of APK.

Exception—fully offline scenario. Then use Core ML on iOS (model conversion via coremltools) or ONNX Runtime on Android. Example: all-MiniLM-L6-v2 in ONNX weighs ~22 MB and produces 384-dimension vectors sufficient for corporate documentation search.

Displaying Search Results on Mobile

Each result contains: text excerpt, document name/section, similarity score, update date. On mobile display:

  • Score as visual relevance indicator (dots/bar, not number—number means nothing to user)
  • Breadcrumbs of source: "User Guide → Installation → iOS"
  • Highlight matching words (even with semantic search—words still often overlap)
  • "Open full document" button

Stages and Timeline

Inventory and normalize knowledge base → choose embedding model → configure vector DB and indices → develop ingestion pipeline → search API with filtering → mobile search UI with results → test quality (precision@K, recall@K) → iterate.

Vector search over corpus up to 50 thousand documents with pgvector—2–4 weeks. With custom embedding model, reranking, multilinguality—5–8 weeks.