Select Page

Fundamentals

What is Entity Linking? 

Entity linking is the process of identifying mentions of entities in text and connecting them to their unique identity in a knowledge base. It is the critical bridge that transforms unstructured text into structured, actionable insights for AI and knowledge graphs.
Reading Time: 7 min

Understanding text is a challenging task stemming from the complexities of language and communication. Lack of context, ambiguity, nuances and contextual differences, idioms and figurative language can be quite hard to master for both humans and machines.

While humans can rely on intuition and lived experience, machines require sophisticated Natural Language Processing (NLP) algorithms and large datasets to approximate this understanding. A central part of that involves identifying distinct entities (such as people, places, organizations, and concepts), mentioned in a text and linking them to a unique identifier in a knowledge base.

This task is called entity linking and it is crucial for identifying the specific entities texts refer to, especially when they have ambiguous names or are mentioned in various forms.

Consider the sentence: “Jordan played exceptionally well against Phoenix last night.”. It illustrates several challenges:

  • Ambiguous labels: The name “Jordan” could refer to multiple entities, the most obvious of which is Michael Jordan (the basketball player) or it could refer to another athlete or a non-public figure named Jordan. Without additional context, linking “Jordan” to the correct entity in a knowledge base is challenging. Same goes with the term “Phoenix”: besides a sports club, if taken out of context, it could refer to the capital city of Arizona, for example.
  • Contextual clues for disambiguation: The mention of “played” provides a contextual clue, suggesting that the context might be related to sports, possibly basketball, if one assumes “Phoenix” refers to the Phoenix Suns, an NBA team. This clue is crucial for disambiguation but requires the entity linking system to understand the context and make connections between entities.

We see how entity linking must deal with ambiguity and leverage contextual clues for disambiguation to accurately identify and link entities to their corresponding entries in a knowledge base.

The entity linking process typically involves two main steps:

  • Named Entity Recognition (NER): Finding a name or phrase in the text that represents an entity (for example, person, location, or organization).
  • Entity disambiguation: Determining which specific entity it is when multiple options exist (for example, distinguishing between “Paris” the city and “Paris” Hilton). The disambiguated entities are then linked to a unique identifier in a knowledge base (such as Wikidata, DBpedia, or a domain-specific database). This linking provides a way to access a rich set of information about the entity, such as its attributes, and relationships with other entities.

Entity linking is the core task in the process of semantic annotation of documents, which would typically also include further information extraction and generation of semantic metadata.

Why is entity linking important?

Entity linking is a crucial part of NLP and it’s especially important when a large size of textual content needs to be analyzed. To organize content, make it easily discoverable, and transform the information encoded in the text into a structured knowledge, we need to attribute the mentions in the text to actual known objects or instances in our database:

Examples of applications that entity linking contributes to are:

  • Enhancing search engines: By understanding the specific entities mentioned in queries and documents, search engines can provide more accurate and relevant results (for example, filtering out documents about Paris Hilton when searching for info about the capital of France). Entity linking can also help getting more complete results.
  • Information extraction and knowledge augmentation: Entity linking helps extracting and transforming information into structured form, such as identifying unknown properties for an entity or relationships between entities.
  • Semantic analysis and recommendations: Understanding the specific entities mentioned in texts enables deeper semantic analysis, which is essential for applications like sentiment analysis, content recommendation, and personalized services
  • Natural language querying (NLQ) and retrieval augmented generation (RAG): Entity linking helps identification of specific concepts to be passed over as parameters for template-based NLQ. It can also help compile more precise queries for retrieval in some RAG implementations.

What are common entity linking approaches?

Early entity linking  systems were rule-based, which means they often relied on hand-crafted rules to identify and disambiguate entities. These rules could include heuristic methods based on entity types, context keywords, and other linguistic features. While rule-based approaches can be highly accurate for specific domains or datasets, they tend to lack scalability and flexibility, especially across diverse or evolving datasets.

Machine learning approaches emerged as a more scalable and accurate alternative.  Traditional machine learning  approaches for entity linking involve feature engineering, labeled data to train models such as Support Vector Machines (SVM), Random Forests, or Gradient Boosting that can recognize and disambiguate entities based on features extracted from the text.

Using large language models for entity linking

The development of neural networks, transformer models, and large language models (LLMs) has revolutionized NLP including entity linking. These models excel at capturing deep contextual cues from text, making them highly effective for both detecting named entities and disambiguating them based on context. Still, language models will always require some customization for entity disambiguation – they need to be “educated” about the specific entity descriptions and identifiers from the reference knowledge base.

Transformers, such as RoBERTa,  can be fine-tuned on entity linking tasks to achieve state-of-the-art performance. On the other hand, very large generative language models, such as ChatGPT, are less appropriate for entity linking, because they can be much more expensive to fine-tune and use.

How do knowledge graphs help entity linking?

Knowledge graphs prove to be a great foundation for entity linking, as they encode rich semantic information about entities, including their attributes, types, and relationships with other entities. This information can be leveraged by entity linking systems to disambiguate entities based on their context within the text and their semantic roles in the knowledge graph.

Knowledge graphs can fit directly into the machine learning model architecture or training process, enhancing the model’s ability to link entities to the specific entries in a graph. This can involve encoding graph information into the model’s embeddings or using graph neural networks to leverage the structure of knowledge graphs. Integrating entity linking with knowledge graphs creates a feedback loop where the entity linking system benefits from the structured knowledge in the graph, and the graph, in turn, is enriched by the newly linked entities, relationships and facts extracted from text. This mutually beneficial relationship facilitates continuous improvement and learning.

How Graphwise powers entity linking

The Graphwise platform provides the infrastructure to perform entity linking at an enterprise scale, turning “dark data” into a trusted semantic backbone.

Advanced semantic annotation

Graphwise uses sophisticated text analysis to scan documents and identify entities. Unlike standard tools that just find “names,” Graphwise uses the rich context of your specific domain to ensure high accuracy.

Grounding AI with knowledge graphs

By linking text mentions to a Graphwise knowledge graph, the system provides “grounding” for large language models (LLMs). This prevents AI hallucinations by ensuring the model refers to verified facts within your database rather than guessing based on word patterns.

Industry-specific precision

Whether in Healthcare (linking symptoms to diseases), Finance (linking subsidiaries to parent companies), or Manufacturing, Graphwise allows you to use specialized ontologies to ensure your entity linking is tuned to the vocabulary of your business.

Conclusion

The shift from “strings” (text) to “things” (entities) is the core of modern information management. entity linking is the technology that makes this shift possible.

By using Graphwise, organizations can bridge the gap between their unstructured content and their structured knowledge, creating an intelligent, connected environment where data is no longer just stored—it is understood.

Want to learn more about getting actionable insights for AI and knowledge graphs from your text?

Subscribe to our Newsletter