Typo-Tolerant Search Implementation for Web Application

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Showing 1 of 1 servicesAll 2065 services
Typo-Tolerant Search Implementation for Web Application
Complex
~2-3 business days
FAQ
Our competencies:
Development stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1252
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1170
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    873
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1092
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    830
  • image_bitrix-bitrix-24-1c_fixper_448_0.png
    Website development for FIXPER company
    849

Typo-Tolerant Search Implementation for Web Applications

Users make mistakes: "hedphones", "wireles", "samsnge". Search without fuzzy matching returns empty results, losing conversions. Implement typo-tolerant search three ways — depending on scale and requirements.

Distance Metrics: Levenshtein vs Damerau-Levenshtein

Levenshtein distance: minimum insertions, deletions, substitutions to transform one string into another.

Damerau-Levenshtein adds transposition (adjacent character swap): "haedphones" → "headphones" — is 1 transposition, not 2 operations. Better for search.

PostgreSQL: pg_trgm

pg_trgm — PostgreSQL extension for similarity search based on trigrams. Works without external services.

CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Index for similarity search
CREATE INDEX idx_products_title_trgm ON products USING GIN (title gin_trgm_ops);
CREATE INDEX idx_products_description_trgm ON products USING GIN (description gin_trgm_ops);

-- Search with similarity threshold
SET pg_trgm.similarity_threshold = 0.3;

SELECT id, title, similarity(title, 'hedphones') AS sim
FROM products
WHERE title % 'hedphones'         -- similarity operator
ORDER BY sim DESC
LIMIT 10;

-- Or combine FTS + fuzzy:
SELECT
    p.id,
    p.title,
    p.price,
    greatest(
        similarity(p.title, 'wireles hedphones'),
        ts_rank(p.search_vector, plainto_tsquery('russian', 'wireles hedphones'))
    ) AS relevance
FROM products p
WHERE
    p.title % 'wireles hedphones'
    OR p.search_vector @@ plainto_tsquery('russian', 'wireles')
ORDER BY relevance DESC
LIMIT 20;

% — similarity operator, uses GIN index. Without index degrades to seq scan.

Threshold tuning: 0.3 — liberal (lots of noise), 0.5 — strict (few typos). For short queries (1–2 words) threshold should be lower.

Meilisearch: Dedicated Fuzzy Search Engine

Meilisearch written in Rust, supports typo tolerance out-of-the-box, simple to configure.

# Docker
docker run -p 7700:7700 getmeili/meilisearch:latest

# Or binary
curl -L https://install.meilisearch.com | sh
./meilisearch --master-key="your-master-key"

Index configuration:

import meilisearch

client = meilisearch.Client('http://localhost:7700', 'your-master-key')
index = client.index('products')

# Search settings
index.update_settings({
    'searchableAttributes': ['title', 'brand', 'description', 'tags'],
    'filterableAttributes': ['category_id', 'status', 'price', 'brand'],
    'sortableAttributes': ['price', 'created_at', 'popularity'],
    'rankingRules': [
        'words',
        'typo',
        'proximity',
        'attribute',
        'sort',
        'exactness',
    ],
    'typoTolerance': {
        'enabled': True,
        'minWordSizeForTypos': {
            'oneTypo': 5,     # words >= 5 chars allow 1 typo
            'twoTypos': 9,    # words >= 9 chars allow 2 typos
        },
        'disableOnWords': ['iPhone', 'iPad', 'MacBook'],  # brands without fuzzy
        'disableOnAttributes': ['sku', 'barcode'],
    },
    'pagination': {
        'maxTotalHits': 10000,
    },
})

Data indexing:

import asyncio
from typing import Any

async def sync_products_to_meilisearch(products: list[dict[str, Any]]) -> None:
    """Batch product sync."""
    documents = [
        {
            'id': p['id'],
            'title': p['title'],
            'brand': p.get('brand', ''),
            'description': p.get('description', ''),
            'category_id': p['category_id'],
            'price': float(p['price']),
            'status': p['status'],
            'tags': [t['name'] for t in p.get('tags', [])],
            'created_at': p['created_at'].timestamp(),
            'popularity': p.get('view_count', 0),
        }
        for p in products
        if p['status'] == 'published'
    ]

    # Meilisearch accepts batches up to 100MB
    batch_size = 1000
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        task = index.add_documents(batch)
        index.wait_for_task(task.task_uid)

Search:

from dataclasses import dataclass

@dataclass
class SearchParams:
    query: str
    category_id: int | None = None
    price_min: float | None = None
    price_max: float | None = None
    page: int = 1
    hits_per_page: int = 20
    sort: str = 'relevance'  # relevance | price:asc | price:desc | created_at:desc

def build_filter(params: SearchParams) -> str | None:
    filters = ['status = "published"']

    if params.category_id:
        filters.append(f'category_id = {params.category_id}')
    if params.price_min is not None:
        filters.append(f'price >= {params.price_min}')
    if params.price_max is not None:
        filters.append(f'price <= {params.price_max}')

    return ' AND '.join(filters) if filters else None

def search_products(params: SearchParams) -> dict:
    sort_map = {
        'price:asc':   ['price:asc'],
        'price:desc':  ['price:desc'],
        'created_at:desc': ['created_at:desc'],
        'relevance':   [],  # default ranking rules
    }

    results = index.search(params.query, {
        'filter':       build_filter(params),
        'sort':         sort_map.get(params.sort, []),
        'page':         params.page,
        'hitsPerPage':  params.hits_per_page,
        'attributesToHighlight': ['title', 'description'],
        'highlightPreTag':  '<mark>',
        'highlightPostTag': '</mark>',
        'attributesToCrop': {'description': 200},
        'showMatchesPosition': False,
    })

    return results

Elasticsearch: Fuzzy Query

If Elasticsearch already used for FTS — fuzzy built-in:

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "hedphones",
            "fields": ["title^3", "brand^2", "description"],
            "fuzziness": "AUTO",
            "prefix_length": 2,
            "max_expansions": 50
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "hedphones",
              "slop": 2
            }
          }
        }
      ]
    }
  }
}

prefix_length: 2 — first 2 characters must match exactly. Reduces false positives and speeds query.

AUTO fuzziness: 0 typos for ≤2 chars, 1 for 3–5 chars, 2 for 6+ chars.

Approach Selection

For small catalog (up to 100k records) with PostgreSQL — pg_trgm sufficient. For large catalog with facets, filters and <10ms requirement — Meilisearch. For analytics platform with aggregations — Elasticsearch.

Timelines

pg_trgm (extension, indexes, queries, threshold tuning): 1 day. Meilisearch (deploy, index config, sync, API): 2–3 days. Fuzzy in existing Elasticsearch cluster: 1 day.