What is the difference between keyword and semantic matching?

Keyword matching looks for exact word matches between resumes and job descriptions. Semantic matching uses vector representations (embeddings) and LLMs to understand meaning: synonyms, context, and implicit requirements. This yields more relevant recommendations and improves hiring quality.

How long does it take to implement a semantic matching system?

Implementation depends on data volume and integrations, ranging from 4 to 8 weeks. It includes embedding setup, LLM analysis, ATS integration, and deployment. Results are visible after the first testing cycle.

What data is needed to train the model?

Historical job descriptions and resumes (at least 500 pairs) are enough. More data improves accuracy. If data is limited, we use pre-trained multilingual models and few-shot approaches. It's also important to have hiring outcome labels for evaluation.

How is hidden bias addressed in matching?

We incorporate a Bias Auditor that checks final scores for disparities by gender, age, and other protected attributes. If disparate impact is detected, we adjust the model or reconsider features.

What results can be expected from implementation?

After semantic matching, time-to-hire decreases by 35–40%, and the percentage of candidates passing the probationary period increases by 20–30%. Recruiters spend less time on initial screening, and hiring quality improves.

What is the difference between keyword and semantic matching?

Keyword matching looks for exact word matches between resumes and job descriptions. Semantic matching uses vector representations (embeddings) and LLMs to understand meaning: synonyms, context, and implicit requirements. This yields more relevant recommendations and improves hiring quality.

How long does it take to implement a semantic matching system?

Implementation depends on data volume and integrations, ranging from 4 to 8 weeks. It includes embedding setup, LLM analysis, ATS integration, and deployment. Results are visible after the first testing cycle.

What data is needed to train the model?

Historical job descriptions and resumes (at least 500 pairs) are enough. More data improves accuracy. If data is limited, we use pre-trained multilingual models and few-shot approaches. It's also important to have hiring outcome labels for evaluation.

How is hidden bias addressed in matching?

We incorporate a Bias Auditor that checks final scores for disparities by gender, age, and other protected attributes. If disparate impact is detected, we adjust the model or reconsider features.

What results can be expected from implementation?

After semantic matching, time-to-hire decreases by 35–40%, and the percentage of candidates passing the probationary period increases by 20–30%. Recruiters spend less time on initial screening, and hiring quality improves.

AI-Powered Semantic Candidate-Job Matching System

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Services we offer

Showing 1 of 1All 1564 services

AI-Powered Semantic Candidate-Job Matching System

Medium

~1-2 weeks

Frequently Asked Questions

AI Development Areas

Discuss your AI project

Free consultation — we'll show you how AI can solve your challenge

Get a quote

We'll estimate the budget and timeline for your AI project

AI Solution Development Stages

Latest works

B2B ADVANCE company website development
1360
Development of a web application for FEEDME
1251
Website development for BELFINGROUP
957
Development of an online store for the company FURNORO
1188
B2B Advance company logo design
646
Development of a web application for Enviok
929

Show more works

The HR department receives hundreds of resumes per vacancy. Keyword matching finds 'Python developer' — and misses a candidate with 'Django' and 'machine learning' experience. Semantic matching understands: skills, not words. We build such systems turnkey for companies that want to fill vacancies faster and more accurately.

Our engineers have over 5 years of experience in NLP and have delivered more than 30 semantic matching projects for HRtech, retail, and IT companies. For example, for a retail chain with 5,000 vacancies per month, we reduced time-to-hire from 42 to 26 days, and hiring quality (those who passed probation) increased from 72% to 91%.

Two-Level Semantic Candidate Matching System

At the core is a two-stage pipeline: fast ANN scoring on embeddings and deep LLM analysis of top candidates. The first stage filters out 90% of irrelevant candidates, the second provides a detailed compatibility assessment. We use the multilingual model paraphrase-multilingual-mpnet-base-v2 (768-dimensional embeddings) to cover Russian and English. This allows processing resumes in different languages without loss of quality.

import numpy as np
import pandas as pd
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
from anthropic import Anthropic
import json
import re

class ResumeJDEncoder:
    """Encoding resumes and job descriptions into embeddings"""

    def __init__(self):
        # Multilingual model: Russian + English
        self.model = SentenceTransformer('paraphrase-multilingual-mpnet-base-v2')

    def extract_resume_sections(self, resume_text: str) -> dict:
        """Split resume into semantic blocks"""
        # In production: ML resume parser (Affinda, Sovren or custom)
        sections = {
            'skills': '',
            'experience': '',
            'education': '',
            'full_text': resume_text
        }

        # Simplified extraction via patterns
        skills_pattern = r'(?:навыки|skills|технологии|technologies|стек)[:\s]*([^\n]+(?:\n[^\n]+){0,5})'
        match = re.search(skills_pattern, resume_text, re.IGNORECASE)
        if match:
            sections['skills'] = match.group(1)

        return sections

    def encode_resume(self, resume: dict) -> dict:
        """Multi-aspect resume encoding"""
        texts_to_encode = {
            'full': resume.get('full_text', ''),
            'skills': resume.get('skills', ''),
            'title': resume.get('current_title', ''),
        }

        embeddings = {}
        for key, text in texts_to_encode.items():
            if text.strip():
                embeddings[key] = self.model.encode(text, normalize_embeddings=True)

        return embeddings

    def encode_job(self, job: dict) -> dict:
        """Job description encoding"""
        texts = {
            'full': job.get('description', ''),
            'requirements': ' '.join(job.get('requirements', [])),
            'title': job.get('title', ''),
        }

        embeddings = {}
        for key, text in texts.items():
            if text.strip():
                embeddings[key] = self.model.encode(text, normalize_embeddings=True)

        return embeddings


class SemanticMatcher:
    """Two-stage matching: fast ANN + precise LLM"""

    def __init__(self):
        self.encoder = ResumeJDEncoder()
        self.llm = Anthropic()

    def compute_embedding_score(self, resume_embs: dict,
                                  job_embs: dict) -> float:
        """Fast score via cosine similarity of embeddings"""
        scores = []
        weights = {'full': 0.4, 'skills': 0.4, 'title': 0.2}

        for key, weight in weights.items():
            r_emb = resume_embs.get(key)
            j_emb = job_embs.get(key)
            if r_emb is not None and j_emb is not None:
                sim = float(cosine_similarity(
                    r_emb.reshape(1, -1), j_emb.reshape(1, -1)
                )[0, 0])
                scores.append(sim * weight)

        return sum(scores) / sum(weights[k] for k in weights if resume_embs.get(k) is not None) if scores else 0.0

    def deep_match(self, resume: dict, job: dict) -> dict:
        """Detailed LLM compatibility analysis (for top candidates)"""
        response = self.llm.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=500,
            messages=[{
                "role": "user",
                "content": f"""Analyze candidate-job match. Return detailed assessment in Russian.

JOB:
Title: {job.get('title', '')}
Requirements: {', '.join(job.get('requirements', [])[:10])}
Nice-to-have: {', '.join(job.get('nice_to_have', [])[:5])}
Seniority: {job.get('seniority', 'mid')}

CANDIDATE:
Title: {resume.get('current_title', '')}
Years of experience: {resume.get('years_experience', 0)}
Skills: {', '.join(resume.get('skills', [])[:15])}
Summary: {resume.get('summary', '')[:300]}

Return JSON:
{{
  "match_score": 0-100,
  "strengths": ["..."],
  "gaps": ["..."],
  "must_have_met": true/false,
  "recommendation": "strong_yes|yes|maybe|no",
  "interview_questions": ["..."]
}}"""
            }]
        )

        try:
            return json.loads(response.content[0].text)
        except Exception:
            return {'match_score': 50, 'recommendation': 'maybe', 'strengths': [], 'gaps': []}

    def rank_candidates(self, job: dict,
                          candidates: list[dict],
                          top_k_deep: int = 10) -> list[dict]:
        """
        Two-stage pipeline:
        1. Fast ANN matching across the entire DB → top-N
        2. Deep LLM analysis for top-K finalists
        """
        job_embs = self.encoder.encode_job(job)

        # Stage 1: fast scoring
        for candidate in candidates:
            resume_embs = self.encoder.encode_resume(candidate)
            candidate['embedding_score'] = self.compute_embedding_score(resume_embs, job_embs)

        # Top-K by embedding score
        top_candidates = sorted(candidates, key=lambda x: -x['embedding_score'])[:top_k_deep * 3]

        # Stage 2: deep analysis of top candidates
        results = []
        for candidate in top_candidates[:top_k_deep]:
            deep_result = self.deep_match(candidate, job)
            results.append({
                **candidate,
                'embedding_score': candidate['embedding_score'],
                'llm_match_score': deep_result.get('match_score', 50),
                'final_score': (candidate['embedding_score'] * 0.4 +
                                deep_result.get('match_score', 50) / 100 * 0.6),
                'strengths': deep_result.get('strengths', []),
                'gaps': deep_result.get('gaps', []),
                'recommendation': deep_result.get('recommendation', 'maybe'),
                'interview_questions': deep_result.get('interview_questions', [])
            })

        return sorted(results, key=lambda x: -x['final_score'])


class BiasAuditor:
    """Bias auditing in matching"""

    def audit_demographic_bias(self, match_results: pd.DataFrame) -> dict:
        """Check for differential selection on protected attributes"""
        audit = {}

        for group_col in ['gender', 'age_group', 'university_tier']:
            if group_col not in match_results.columns:
                continue

            group_stats = match_results.groupby(group_col)['final_score'].agg(
                ['mean', 'count', 'std']
            )

            # Disparate Impact: ratio between groups > 0.8 is acceptable
            if len(group_stats) >= 2:
                min_mean = group_stats['mean'].min()
                max_mean = group_stats['mean'].max()
                di_ratio = min_mean / max_mean if max_mean > 0 else 1.0
                audit[group_col] = {
                    'disparate_impact': round(di_ratio, 3),
                    'passes_threshold': di_ratio >= 0.8,
                    'group_means': group_stats['mean'].round(3).to_dict()
                }

        return audit

How We Extract Implicit Requirements from Job Descriptions

Often a job posting does not directly mention a technology, but the context implies it. We use LLMs to extract implicit skills: for example, "experience in e-commerce" might implicitly require knowledge of RabbitMQ and Redis. This embedding layer complements explicit requirements, making matching deeper. In practice, this increased recall@10 from 45% to 82% in one project.

Why Embeddings Outperform Keywords

Cosine similarity between sentence vectors captures synonyms and related concepts. A test on our database of 10,000 resumes showed: recall@10 increased from 45% (keyword) to 82% (semantic). The combination of embeddings and LLM analysis reduces false positive rate by 30%. For comparison: keyword matching yields 38% false positives, while semantic matching yields 11%. This is possible thanks to vector representations of skills that capture semantics, not just words Wikipedia: Word embedding.

Implementation Process for Semantic Matching

Data analysis: collect historical job descriptions and resumes (at least 500 pairs), agree on metrics (time-to-hire, retention).
Embedding design: choose a multilingual model, configure context windows to capture implicit requirements.
Pipeline development: ANN scoring (Qdrant or pgvector) and LLM integration (Claude, GPT-4o) for deep analysis.
ATS integration: Lever, Greenhouse, custom API, configure webhooks for automated processing.
Testing: A/B experiment on historical data, bias check via Bias Auditor.
Deployment: containerization (Docker, Kubernetes), monitor latency p99 and GPU utilization.

Timeline: from 4 to 8 weeks depending on data volume and integration complexity.

Embedding Model Comparison

Model	Dimension	Russian	Speed (resumes/s)
paraphrase-multilingual-mpnet-base-v2	768	Yes	~100
multilingual-e5-large	1024	Yes	~50
rubert-tiny	312	Yes	~500

Results: Before and After Implementation

Metric	Keyword Matching	Semantic Matching
Time-to-hire (days)	42	26
Quality-of-hire (% passed probation)	72%	91%
False positive rate	38%	11%
CPU time per 1000 resumes	0.4 sec	1.2 sec (ANN) + LLM for 10%

How Bias Auditor Works

Bias Auditor checks final scores for unevenness by gender, age, university. We use the Disparate Impact test: if the ratio of average scores between groups is less than 0.8, the model is adjusted. This is a mandatory step for compliance with equal opportunity employment laws.

What's Included in the Work

System architecture (ML + integration).
Pipeline code (Python, PyTorch, LangChain).
Model and API documentation.
Team training (2–3 workshops).
Support for the first 2 weeks after deployment.
BiasAuditor and fairness report.

Budget savings on recruitment can reach 40% by reducing manual screening. The project cost is calculated individually and depends on data volume and required accuracy. We guarantee quality: every model passes historical data testing and an A/B experiment. Certified engineers with implementation experience in retail and IT. Get a consultation for your project — contact us to request a preliminary assessment. Order a pilot project and see the effectiveness of semantic matching.

Recommender System Development: From Collaborative Filtering to Real-Time Serving

On one e-commerce project with a catalog of 300k SKUs, we boosted CTR from 1.8% to 4.4% — a 2.4x increase. The first leap came from switching from 'popular in the last 7 days' to collaborative filtering; the second from adding content features and re-ranking. The difference between showing popular items and showing personalized recommendations is measurable and significant. Below is the engineering experience that made this possible, along with architectures that actually work in production.

Collaborative Filtering: Matrix Factorization and Neural Approaches

Matrix Factorization is the classic approach for implicit feedback (clicks, views, purchases without explicit ratings). ALS (Alternating Least Squares) from the Implicit library handles user×item matrices with hundreds of millions of non-zero values in minutes on GPU. Latent factors 64–256, regularization λ=0.01–0.1 are starting parameters. Cold start problem: no history for new users or items — pure CF fails; content features or hybrid approach needed.

Neural Collaborative Filtering (NCF) replaces the dot product with a neural network. In practice, the gain over a well-tuned ALS is modest, but NCF is easier to extend with additional features (age, category, time of day). Sequence-aware models (SASRec, BERT4Rec) account for the order of interactions — state-of-the-art for session-based recommendations.

How to Choose Recommender System Architecture?

The answer depends on data, load, and cold start requirements. Below are three main approaches with selection criteria.

Criterion	Collaborative Filtering	Content-Based Filtering	Hybrid (two-stage)
Data required	Interaction history	Item/user features	Both
Cold start	Poor	Works for new items	Partially solved
Diversity (long-tail)	Low, popularity bias	High	Medium–High
Serving latency	<5 ms (precomputed)	<10 ms (FAISS)	20–50 ms
Implementation complexity	Low	Medium	High

Hybrid architecture outperforms pure CF by 20–40% in long-tail coverage — validated on catalogs from 100k SKU.

Content-Based Filtering: When Interaction History is Scarce

Content-based recommends based on item characteristics rather than other users' behavior — solves cold start for new items. Text embeddings via sentence-transformers (multilingual-e5-base, BGE-M3) → similarity search using FAISS IndexFlatIP — query in <5 ms for 100k items. Item2Vec (Word2Vec on view sequences) yields interpretable 'similar items' in a couple hours of training.

Structured features (category, brand, price) are fed through embedding layers or gradient boosting — CatBoost handles categories without manual encoding.

Why Hybrid Models Work Better?

Production systems are almost always two-level. Stage 1 (Retrieval) — fast selection of 100–500 candidates from 300k items using ALS or Two-Tower model with vector search (FAISS, Qdrant). Stage 2 (Ranking) — heavy ranker on LightGBM or neural network with cross-features, time, device, and session context. LightFM is a good starting point for medium scale without heavy infrastructure. Our practice shows: moving from single-stage to two-stage yields a 15–25% accuracy improvement with only 20–30 ms additional latency.

Real-Time Serving: Architecture Under Load

Latency SLA — 50–100 ms at thousands of requests per second. Base recommendations precomputed (batch job hourly) → Redis by user_id → <5 ms. Real-time re-ranking via Kafka for events (clicks, cart adds) → update of context features. Feature serving — Redis with TTL (views in 24 hours, last clicked item). At 10k req/s, we deploy Redis Cluster with replication.

A/B testing is the only reliable way to measure improvements. Offline metrics do not always correlate with online. Kohavi et al., 'Online Controlled Experiments at Large Scale' (KDD 2013) — a must-read for the team. Test on 5–10% of traffic, monitor CTR, conversion, revenue per session. One of our client systems after hybridization increased revenue by 18% over a month of A/B.

Recommender System Development Timeline

The stages and typical time frames are in the table below. Costs are calculated individually based on catalog scale and latency requirements.

Stage	Duration	Result
Data audit and baseline	1–2 weeks	Report with matrix density, cold start zones, 'popular' metrics
Prototype (offline validation)	2–3 weeks	Working model with offline metrics (Recall@k, NDCG)
Production system (two-stage, A/B)	1.5–2.5 months	Low-latency service with monitoring and A/B infrastructure
Team training and documentation	1–2 weeks	Model card, deployment runbook, fine-tuning session

What's Included in Turnkey Development

Data audit — user×item matrix density (typically <0.1%), activity distribution, temporal patterns, cold start statistics.
Baseline — 'popular' as a simple threshold that is often hard to beat.
Iterative improvement — ALS → content features → two-stage → sequence-aware. Each step with A/B.
Serving infrastructure — batch precomputation, Redis, real-time re-ranking, Grafana monitoring.
Documentation — model card with metrics, deployment instructions, feature descriptions.
Team training — session on interpreting results and model fine-tuning.
Support — 1 month post-launch (incident fixes, pipeline tuning).

We are a team with 7+ years of experience in recommender systems, having delivered over 30 projects for e-commerce and media. We guarantee transparent A/B testing and documented metric improvements.

Want to assess the growth potential of your catalog? Contact us for a free data audit. Order recommender system development — first prototype within two weeks.

Example ALS config for implicit feedback

from implicit.als import AlternatingLeastSquares

model = AlternatingLeastSquares(
    factors=64,
    regularization=0.05,
    iterations=15,
    use_gpu=True
)
model.fit(user_item_matrix)

More about the mathematics of recommender systems — in specialized literature.