Typo-Tolerant Search Implementation for Web Applications
Users make mistakes: "hedphones", "wireles", "samsnge". Search without fuzzy matching returns empty results, losing conversions. Implement typo-tolerant search three ways — depending on scale and requirements.
Distance Metrics: Levenshtein vs Damerau-Levenshtein
Levenshtein distance: minimum insertions, deletions, substitutions to transform one string into another.
Damerau-Levenshtein adds transposition (adjacent character swap): "haedphones" → "headphones" — is 1 transposition, not 2 operations. Better for search.
PostgreSQL: pg_trgm
pg_trgm — PostgreSQL extension for similarity search based on trigrams. Works without external services.
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- Index for similarity search
CREATE INDEX idx_products_title_trgm ON products USING GIN (title gin_trgm_ops);
CREATE INDEX idx_products_description_trgm ON products USING GIN (description gin_trgm_ops);
-- Search with similarity threshold
SET pg_trgm.similarity_threshold = 0.3;
SELECT id, title, similarity(title, 'hedphones') AS sim
FROM products
WHERE title % 'hedphones' -- similarity operator
ORDER BY sim DESC
LIMIT 10;
-- Or combine FTS + fuzzy:
SELECT
p.id,
p.title,
p.price,
greatest(
similarity(p.title, 'wireles hedphones'),
ts_rank(p.search_vector, plainto_tsquery('russian', 'wireles hedphones'))
) AS relevance
FROM products p
WHERE
p.title % 'wireles hedphones'
OR p.search_vector @@ plainto_tsquery('russian', 'wireles')
ORDER BY relevance DESC
LIMIT 20;
% — similarity operator, uses GIN index. Without index degrades to seq scan.
Threshold tuning: 0.3 — liberal (lots of noise), 0.5 — strict (few typos). For short queries (1–2 words) threshold should be lower.
Meilisearch: Dedicated Fuzzy Search Engine
Meilisearch written in Rust, supports typo tolerance out-of-the-box, simple to configure.
# Docker
docker run -p 7700:7700 getmeili/meilisearch:latest
# Or binary
curl -L https://install.meilisearch.com | sh
./meilisearch --master-key="your-master-key"
Index configuration:
import meilisearch
client = meilisearch.Client('http://localhost:7700', 'your-master-key')
index = client.index('products')
# Search settings
index.update_settings({
'searchableAttributes': ['title', 'brand', 'description', 'tags'],
'filterableAttributes': ['category_id', 'status', 'price', 'brand'],
'sortableAttributes': ['price', 'created_at', 'popularity'],
'rankingRules': [
'words',
'typo',
'proximity',
'attribute',
'sort',
'exactness',
],
'typoTolerance': {
'enabled': True,
'minWordSizeForTypos': {
'oneTypo': 5, # words >= 5 chars allow 1 typo
'twoTypos': 9, # words >= 9 chars allow 2 typos
},
'disableOnWords': ['iPhone', 'iPad', 'MacBook'], # brands without fuzzy
'disableOnAttributes': ['sku', 'barcode'],
},
'pagination': {
'maxTotalHits': 10000,
},
})
Data indexing:
import asyncio
from typing import Any
async def sync_products_to_meilisearch(products: list[dict[str, Any]]) -> None:
"""Batch product sync."""
documents = [
{
'id': p['id'],
'title': p['title'],
'brand': p.get('brand', ''),
'description': p.get('description', ''),
'category_id': p['category_id'],
'price': float(p['price']),
'status': p['status'],
'tags': [t['name'] for t in p.get('tags', [])],
'created_at': p['created_at'].timestamp(),
'popularity': p.get('view_count', 0),
}
for p in products
if p['status'] == 'published'
]
# Meilisearch accepts batches up to 100MB
batch_size = 1000
for i in range(0, len(documents), batch_size):
batch = documents[i:i + batch_size]
task = index.add_documents(batch)
index.wait_for_task(task.task_uid)
Search:
from dataclasses import dataclass
@dataclass
class SearchParams:
query: str
category_id: int | None = None
price_min: float | None = None
price_max: float | None = None
page: int = 1
hits_per_page: int = 20
sort: str = 'relevance' # relevance | price:asc | price:desc | created_at:desc
def build_filter(params: SearchParams) -> str | None:
filters = ['status = "published"']
if params.category_id:
filters.append(f'category_id = {params.category_id}')
if params.price_min is not None:
filters.append(f'price >= {params.price_min}')
if params.price_max is not None:
filters.append(f'price <= {params.price_max}')
return ' AND '.join(filters) if filters else None
def search_products(params: SearchParams) -> dict:
sort_map = {
'price:asc': ['price:asc'],
'price:desc': ['price:desc'],
'created_at:desc': ['created_at:desc'],
'relevance': [], # default ranking rules
}
results = index.search(params.query, {
'filter': build_filter(params),
'sort': sort_map.get(params.sort, []),
'page': params.page,
'hitsPerPage': params.hits_per_page,
'attributesToHighlight': ['title', 'description'],
'highlightPreTag': '<mark>',
'highlightPostTag': '</mark>',
'attributesToCrop': {'description': 200},
'showMatchesPosition': False,
})
return results
Elasticsearch: Fuzzy Query
If Elasticsearch already used for FTS — fuzzy built-in:
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "hedphones",
"fields": ["title^3", "brand^2", "description"],
"fuzziness": "AUTO",
"prefix_length": 2,
"max_expansions": 50
}
},
{
"match_phrase": {
"title": {
"query": "hedphones",
"slop": 2
}
}
}
]
}
}
}
prefix_length: 2 — first 2 characters must match exactly. Reduces false positives and speeds query.
AUTO fuzziness: 0 typos for ≤2 chars, 1 for 3–5 chars, 2 for 6+ chars.
Approach Selection
For small catalog (up to 100k records) with PostgreSQL — pg_trgm sufficient. For large catalog with facets, filters and <10ms requirement — Meilisearch. For analytics platform with aggregations — Elasticsearch.
Timelines
pg_trgm (extension, indexes, queries, threshold tuning): 1 day. Meilisearch (deploy, index config, sync, API): 2–3 days. Fuzzy in existing Elasticsearch cluster: 1 day.







