Select Page

From Data Exchange to Knowledge Exchange: Why Context is the Missing Layer in Enterprise AI

This post is adapted from the CDO Matters podcast, hosted by Malcolm Hawker, CDO at Profisee. In this episode, Malcolm spoke with Andreas Blumauer, Senior Vice President of Growth at Graphwise, about the origins of the semantic web, the rise of enterprise knowledge graphs, their relationship to AI, and what it actually takes to get started.

Main Takeaways

  • The semantic web found its home in the enterprise — RDF, SPARQL, and OWL never transformed the public internet, but they're now the foundation of enterprise AI strategy at Microsoft, Google, and SAP.
  • Vector search finds similar text; knowledge graphs find the right answer — in regulated domains, that distinction determines whether an AI system can be trusted or merely tolerated.
  • Knowledge graphs don't replace your data systems — they connect them — sitting above MDM, databases, and pipelines as a context layer, adding the meaning no single system of record carries on its own.
  • The ontologist is becoming a knowledge steward — LLMs handle the creation work; human judgment decides what the graph should represent to generate business value.

The story of the Semantic Web begins in 1998, when Tim Berners-Lee published a note on the W3C server introducing the Resource Description Framework (RDF). The ambition was significant: as the web grew into a crowded, heterogeneous space of formats and interfaces, there needed to be a parallel layer that machines could read in a standardized way. Not just how data was serialized, but how its meaning was described and shared across domains. The keyword was interoperability at scale. Publishers across different domains would agree not just on data formats, but on the knowledge models — the ontologies and taxonomies — that gave that data meaning.

It was a compelling vision, and for a time it gained real momentum. But by the mid-2010s the standards process had slowed, and Web 2.0 had taken the internet in a very different direction. Platforms captured data and controlled access through proprietary APIs, effectively siloing information behind point-to-point interfaces. The open, self-describing web Berners-Lee had imagined never fully materialized. He has since said publicly that he is disappointed by how the web developed — and that enforcing the semantic layer earlier, before the document web calcified into the platform web, might have changed everything.

But the story does not end there. While the public semantic web stalled, enterprises quietly began adopting its standards. RDF, SPARQL, SKOS, OWL — the full semantic web technology stack is now widely deployed inside organizations building enterprise knowledge graphs. What failed to transform the internet has found a very productive home in the enterprise. And the timing could not be better: Microsoft, Google, SAP, and others are now explicitly putting knowledge graphs at the center of their AI strategies. And also Gartner has finally placed knowledge graphs at the plateau of productivity on the Hype Cycle. This is no longer a niche technology. It is a foundational one.

What knowledge graphs actually do differently

To understand why knowledge graphs matter, it helps to understand what is missing from conventional data exchange. When one organization sends another a file of records, it sends data — but not meaning. The receiving system may know it is looking at customer records, but the deeper context — what definitions were used, what rules govern the data, what relationships exist between entities — is absent. The recipient has to figure it out, usually manually, often inconsistently. This is the fundamental limitation of file-based and API-based data exchange: it moves data, but not knowledge.

Knowledge graphs change this by making data self-describing. A dataset built on semantic web standards carries its own meaning — not as external documentation, but embedded in the data itself. Ontologies and taxonomies describe what the data represents, and that description travels with it.

Consider a simple field labeled “water”. In data from an industrial plant it likely means water consumption. Sent to a government agency, it carries a different meaning entirely. Without shared semantics, both parties have to guess. With a knowledge model in place, the meaning is explicit, machine-readable, and unambiguous. This is the distinction between data exchange and knowledge exchange — the latter adds a layer of context that transforms raw records into something an automated system can actually reason over.

This matters beyond the theoretical. There are many real-world vocabularies like Medical Subject Headings (MeSH), widely used across life sciences, and EuroVoc, the European Union’s multilingual thesaurus for annotating legislative documents. Both of these demonstrate how shared knowledge models make data from different sources instantly comparable and searchable in ways that keyword matching or embedding similarity simply cannot replicate. The knowledge graph does not replace these existing structures — it connects them, and in doing so, unlocks value that has always been latent in the data but impossible to surface.

Why vectors alone are not enough

Vector embeddings have become the default retrieval mechanism in most RAG architectures, and they have genuine strengths — particularly for surfacing semantically related content from large unstructured corpora. But their limitations become significant in knowledge-intensive enterprise queries. When a user types a question into a chatbot, the entire context available to a pure vector retrieval system is that single question. It has no knowledge of the domain the user works in, the regulatory framework they operate under, the process they are following, or the specific entities their question refers to. It finds statistically similar text. That is not the same as finding the right answer.

In high-stakes domains — Financial Services, Life Sciences, Legal, Healthcare — this distinction is not academic. A compliance question requires the system to know which regulations apply, how they are structured, and how they relate to the specific entities in the query. That cannot be randomly chunked up by a vector model. It needs to be explicitly represented in a domain knowledge model that the retrieval system can traverse.

The knowledge graph acts as a context engine, dramatically narrowing the space of relevant content before the LLM gets involved. It grounds its responses in structured, semantically rich data rather than probabilistic pattern-matching over disconnected text. The combination of a knowledge graph and an LLM is far more powerful than either alone. The graph handles precision and context, the LLM handles natural language. Vector retrieval used in isolation delivers neither at the level enterprise use cases demand.

GraphRAG formalizes this combination into a retrieval architecture: instead of surfacing loosely related text chunks, it traverses the knowledge graph to assemble context that reflects actual relationships between entities, concepts, and documents. The result is answers that are not just more accurate but more explainable.

Knowledge graphs and data governance

One of the less-discussed capabilities of knowledge graph infrastructure is its application to data governance and quality. Because RDF functions as a kind of universal mapping language — capable of representing almost any data format in a common structure — it enables cross-silo validation in a way that silo-by-silo governance cannot.

Using SHACL, part of the semantic web standards stack, organizations can define a set of constraints and run them simultaneously across data from multiple source systems. Inconsistencies and missing values that would never surface within a single system become visible when the data is viewed as a unified graph. It is the kind of data quality intervention that organizations have historically needed teams of analysts to approximate manually.

There is also a more targeted application for governance in high-stakes query scenarios. Rather than generating an answer probabilistically and then attempting to cite sources, it is possible to force the system to retrieve its response directly from the knowledge graph. This is achieved by having the LLM translate a natural language question into a SPARQL query, which executes against the graph and returns a result grounded entirely in verified, curated data.

This is a fundamentally different risk profile from a system that generates first and references second. For organizations operating under strict regulatory requirements, the difference is not marginal. It is the difference between a system that can be trusted and one that cannot.

Bridging knowledge graphs and traditional data management

For organizations with significant investments in master data management (MDM), traditional data integration, and relational analytics, the question is not whether to abandon those foundations but how to extend them. The world of rows and columns is not going away. Dashboards, operational reporting, and structured analytics will remain essential. But the knowledge graph sits above all of that as an orchestration and enrichment layer.  It connects the dots between silos, adds domain context, and makes the full data landscape navigable in a way that no single system of record can achieve on its own.

The relationship between MDM and knowledge graphs is particularly worth examining. MDM provides authoritative definitions of shared entities — customer, product, asset — and manages the quality standards that govern them. A knowledge graph extends that foundation by describing the relationships between those entities and connecting them to the broader enterprise context.

For example, not just that Joe Smith and JM Smith are the same person, but what that person’s household looks like, what other entities they are connected to, what transactions, interactions, and relationships exist across the full data landscape. The two capabilities are not in competition. Used together, they get considerably closer to what a genuinely knowledge-driven organization needs — data that is not just accurate, but contextually rich enough to support discovery, not just reporting.

A concrete example illustrates this point. A large pharmaceutical company integrated more than twenty heterogeneous data sources into a virtual semantic layer built on a knowledge graph, without changing any of the underlying systems. The result was that different stakeholders could ask questions in natural language and receive answers that drew on all twenty sources simultaneously. This also included sources they had not previously known existed within their own organization. That kind of cross-silo discovery, at scale, without manual integration effort, is what shifts an organization from analytics-driven to knowledge-driven.

Getting started: use cases, not grand designs

A common concern when knowledge graphs are introduced to organizations is that building one sounds enormously complex and time-consuming. But creating a knowledge graph is not more sophisticated or more time-consuming than data and content management in general. It is a different exercise, and one where automation now does most of the heavy lifting.

The early days of the semantic web, where ontologists spent weeks hand-crafting conceptual models in specialized tools that only specialists could read, are gone. The tooling has matured, the automation has improved, and the process has become considerably more approachable.

The practical starting point is not to graph everything. It is to identify one business-critical use case where the impact of better data connectivity would be demonstrably high. Analytics, search, and generative AI are the three most common entry points, each placing slightly different demands on the knowledge graph but all benefiting from the same underlying semantic infrastructure.

From there, the methodology is consistent: build the domain model, ingest and transform the relevant datasets, link them, test them, and make them available to the target application. Domain experts validate the results, the system iterates, and — if the first use case works — the same methodology extends naturally to additional datasets and departments. What Graphwise calls the five-star journey or five star AI infrastructure begins at zero and, for strategically committed organizations, can reach full maturity within months.

The emerging role of the knowledge steward

The conversation around knowledge graphs and AI typically focuses on what knowledge graphs do for AI — providing context, reducing hallucinations, improving retrieval precision. Less discussed is the inverse: what AI can do for knowledge graphs. The answer, increasingly, is a great deal.

LLM-driven modeling tools can now propose new elements for a knowledge model autonomously — new concepts, synonyms, relationships — with human subject matter experts reviewing and approving rather than authoring from scratch. The quality of these suggestions has reached a point where it materially accelerates the knowledge graph development cycle.

This shift is changing the nature of the ontologist and taxonomist role. The nitty-gritty creation work — new concept, new synonym, new relationship — is increasingly handled by the LLM. What remains irreducibly human is the judgment about what the knowledge graph should represent in order to create business value. Things like scoping decisions, prioritization, and the domain expertise needed to evaluate whether a machine-generated suggestion is accurate and useful in context.

The role is evolving from knowledge engineer to knowledge steward — someone who understands both the business and the knowledge model well enough to guide the system toward the areas where enrichment will have the greatest impact. Far from making the role obsolete, AI is making it more strategic.

The context imperative

Context has always been important but it’s becoming increasingly essential. It’s what separates a file from knowledge. It’s what makes a governance rule apply in one situation and not another. It’s what decides if an AI system can be trusted or merely tolerated.

Organizations that invest in building genuine semantic infrastructure — domain knowledge models, enterprise knowledge graphs, standards-based interoperability — are not just improving their data management. They are building the contextual foundation that will determine how much value they can extract from AI over the next decade. They are creating something very valuable: a data landscape that understands itself.

Getting Started with Knowledge Graphs

  1. Identify a business-critical use case: Pinpoint one high-impact area where better data connectivity solves a real problem — whether that’s a generative AI scenario requiring accuracy and context, siloed search and discovery, or analytics that spans multiple source systems.
  2. Define and build the domain model: Work with domain experts to build the core ontologies and taxonomies that describe the meaning of the relevant data. LLM-driven modeling tools now materially accelerate this step, shifting the focus to human validation.
  3. Ingest, transform, and link relevant datasets: Pull data from the identified silos, map it to the common semantic structure, and link entities across datasets to create a unified view.
  4. Test and integrate: Validate the knowledge graph with domain experts, then integrate it into the target application.
  5. Establish the knowledge steward role: Begin the proactive shift from knowledge engineer to knowledge steward — someone who guides the system toward where semantic enrichment will have the greatest business impact.

Want to learn more about the semantic layer and how it can help you build genuine semantic infrastructure?

Details

Category: Knowledge Graph

What Is a Semantic Layer?

The semantic layer (and even more so a semantic backbone) is the missing cog in data management that aims to address the challenges of data literacy, inconsistency, and democratization. By abstracting complex data models in a language that reflects the vocabulary of business teams, it serves as a consistent representation of business data. It provides a unified view across the organization, simplifying access and ensuring better governance.

Learn more

FAQ

Any Questions? Look Here

An Enterprise Knowledge Graph is a semantic data model that provides a unified, structured view of an organization's disparate data by interlinking descriptions of entities, such as business objects, events, and concepts, along with their complex relationships. It works by functioning as a non-intrusive virtual layer that sits atop existing databases, using formal ontologies and taxonomies to semantically enrich and classify both structured and unstructured information without requiring the data to be moved from its original source. By assigning unique identifiers to data points and mapping them across a shared conceptual framework, an EKG breaks down internal silos to create a "single source of truth," enabling advanced analytics, improved decision-making, and more effective AI applications through a common machine-readable language.

The Semantic Web is an extension of the World Wide Web designed to make data machine-interpretable through standardized metadata, ontologies, and global identifiers like URIs and RDF. It is often perceived as a public failure because its original vision of a decentralized "Web of Data" was hindered by high technical complexity, a lack of user-friendly tools for casual contributors, and the emergence of "walled gardens" where platforms preferred proprietary data silos over open interoperation. However, it has quietly succeeded in the enterprise sector under the guise of knowledge graphs and semantic layers, where its standards (such as RDF and SPARQL) provide a powerful framework for integrating heterogeneous data silos, ensuring data governance, and enabling sophisticated AI and analytics that require precise, context-rich information.

To build a knowledge graph without "boiling the ocean," starts by identifying a single, high-value business use case and defining the specific competency questions it needs to answer rather than attempting a comprehensive enterprise-wide implementation. Adopt an agile, iterative approach by beginning with a small pilot project that utilizes existing vocabularies and taxonomies as foundational building blocks. This incremental strategy allows you to demonstrate tangible value quickly, gain stakeholder buy-in, and refine your semantic model before gradually expanding the graph's scope to other departments or data domains. Focusing on a "minimum viable knowledge graph" ensures the project remains manageable, functional, and directly aligned with measurable business objectives.

Self-describing data refers to information that carries its own context, meaning, and structure within the dataset itself, rather than relying on external documentation or separate schema definitions. In the context of semantic web technologies and knowledge graphs, this is achieved by embedding machine-readable metadata, ontologies, and taxonomies directly alongside the data—typically using standards like RDF (Resource Description Framework). Because the meaning of every data element is explicitly defined and travels with the data, systems can automatically interpret, link, and integrate information from disparate sources without manual intervention or guesswork, ensuring that the data remains unambiguous and interoperable across different environments.