Knowledge Graph Construction from Texts

We design and deploy artificial intelligence systems: from prototype to production-ready solutions. Our team combines expertise in machine learning, data engineering and MLOps to make AI work not in the lab, but in real business.
Showing 1 of 1 servicesAll 1566 services
Knowledge Graph Construction from Texts
Complex
~1-2 weeks
FAQ
AI Development Areas
AI Solution Development Stages
Latest works
  • image_website-b2b-advance_0.png
    B2B ADVANCE company website development
    1212
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_websites_belfingroup_462_0.webp
    Website development for BELFINGROUP
    852
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_logo-advance_0.png
    B2B Advance company logo design
    561
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822

Implementation of Knowledge Graph Construction from Text

A knowledge graph is a structured representation of knowledge as triples (subject, predicate, object) stored as a graph. Automatic construction from text allows transforming unstructured corpora into a navigable, queryable knowledge base.

What is a Knowledge Graph and When Is It Needed

Unlike a relational database, a knowledge graph flexibly represents highly interconnected relations: "Ivan Petrov → works_in → Gazprom → located_in → Moscow → is_capital_of → Russia". Graph queries ("Find all employees of companies located in Moscow") are impossible or inconvenient in SQL.

A knowledge graph is needed when:

  • Data is highly interconnected with many relation types
  • Multi-level queries are needed (graph traversal)
  • Integration of data from different sources is planned
  • Explainability is required: "why did the system decide this" — because A is connected to B through C

Architecture of Automatic Construction

Three key components work sequentially:

Entity Extraction — NER with an expanded set of types. For corporate graphs: PERSON, ORGANIZATION, LOCATION, PRODUCT, EVENT, DATE, MONEY, ROLE.

Relation Extraction — determining the relationship type between entity pairs in a sentence or paragraph. REBEL (Babelscape) — the best open-source tool for end-to-end triple extraction.

Coreference Resolution — resolving coreferences: "Gazprom... Company... It..." — all refer to one entity. Uses NeuralCoref or spaCy-experimental.

Entity Linking — linking mentioned entities to canonical records in the base (Wikidata, DBpedia): "VTB", "Bank VTB", "VTB Bank" → one graph node.

Technical Stack

# REBEL for triple extraction
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Babelscape/rebel-large")
model = AutoModelForSeq2SeqLM.from_pretrained("Babelscape/rebel-large")

def extract_triplets(text: str) -> list[tuple]:
    inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True)
    outputs = model.generate(**inputs, max_length=256)
    decoded = tokenizer.batch_decode(outputs, skip_special_tokens=False)[0]
    # Parsing REBEL special format: <triplet> <subj> <rel> <obj>
    return parse_rebel_output(decoded)

Graph Storage

Neo4j — de facto standard for graph databases:

from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))

def add_triplet(tx, subject, predicate, obj, source_doc):
    tx.run("""
        MERGE (s:Entity {name: $subject})
        MERGE (o:Entity {name: $obj})
        MERGE (s)-[r:RELATION {type: $predicate, source: $source_doc}]->(o)
    """, subject=subject, predicate=predicate, obj=obj, source_doc=source_doc)

Queries in Cypher:

// Find all colleagues of a person (working in the same company)
MATCH (p:Entity {name: "Ivan Petrov"})-[:RELATION {type: "works_in"}]->
      (org:Entity)<-[:RELATION {type: "works_in"}]-(colleague:Entity)
WHERE colleague <> p
RETURN colleague.name

Integration with LLM (GraphRAG)

Knowledge graph + LLM = GraphRAG: instead of semantic search over chunks, LLM receives context from a connected subgraph. Microsoft GraphRAG (LangChain implementation) shows significantly better results for questions about relations between entities compared to classic RAG.

Workflow:

  1. User question → entity extraction
  2. Graph traversal from these entities (2–3 levels)
  3. Subgraph → text representation → LLM context
  4. LLM generates answer

Maintaining Freshness