Implementation of Relation Extraction Between Entities
Relation Extraction (RE) determines the type of relationship between two named entities in text. "Sberbank" and "German Gref" → relationship is_ceo. This is a step above NER — from entity recognition to understanding their relationships.
Formal Definition
Input: text + pair of entities (e1, e2) + their types. Output: relation type from a predefined set or NO_RELATION.
Example relation schema for corporate texts:
-
works_in(PERSON, ORG) -
is_subsidiary(ORG, ORG) -
located_in(ORG, LOC) -
participates_in_deal(ORG, ORG, MONEY) -
appointed_to_position(PERSON, ORG, DATE)
Implementation Approaches
Prompt-based (LLM): the fastest path to a working system. The triple (e1, text, e2) is passed to the model:
In the following text, determine the type of relationship between entities [Sberbank] and [German Gref].
Available types: works_in, is_ceo, founded, left.
If there is no relationship — answer NO_RELATION.
Text: {text}
Fine-tuned BERT: for high load and fixed relation schema. Entity-marker approach: special tokens [E1]Sberbank[/E1] are inserted in the text, classification is based on [CLS] + [E1] + [E2] tokens.
REBEL (Facebook): end-to-end RE without intermediate NER, directly generates triples (subject, relation, object).
Distant Supervision
The problem with RE — labeled data is needed. Distant Supervision helps overcome this: a knowledge base (Wikidata, Freebase) is used to automatically label texts containing pairs of entities in known relationships. Noisy (lots of errors), but allows getting 100K+ examples without manual annotation.
Metrics
RE metrics are strictly calculated: correct only if both entities, relationship direction, and type are correctly determined. Micro F1 across all relations. Typical fine-tuned BERT quality: 70–80% F1 on standard datasets (TACRED, DocRED).







