Select Page

Blog Post

From Data Exchange to Knowledge Exchange: Why Context is the Missing Layer in Enterprise AI

Reading Time: 11 min

This post is adapted from the CDO Matters podcast, hosted by Malcolm Hawker, CDO at Profisee. In this episode, Malcolm spoke with Andreas Blumauer, Senior Vice President of Growth at Graphwise, about the origins of the semantic web, the rise of enterprise knowledge graphs, their relationship to AI, and what it actually takes to get started.

The story of the Semantic Web begins in 1998, when Tim Berners-Lee published a note on the W3C server introducing the Resource Description Framework (RDF). The ambition was significant: as the web grew into a crowded, heterogeneous space of formats and interfaces, there needed to be a parallel layer that machines could read in a standardized way. Not just how data was serialized, but how its meaning was described and shared across domains. The keyword was interoperability at scale. Publishers across different domains would agree not just on data formats, but on the knowledge models — the ontologies and taxonomies — that gave that data meaning.

It was a compelling vision, and for a time it gained real momentum. But by the mid-2010s the standards process had slowed, and Web 2.0 had taken the internet in a very different direction. Platforms captured data and controlled access through proprietary APIs, effectively siloing information behind point-to-point interfaces. The open, self-describing web Berners-Lee had imagined never fully materialized. He has since said publicly that he is disappointed by how the web developed — and that enforcing the semantic layer earlier, before the document web calcified into the platform web, might have changed everything.

But the story does not end there. While the public semantic web stalled, enterprises quietly began adopting its standards. RDF, SPARQL, SKOS, OWL — the full semantic web technology stack is now widely deployed inside organizations building enterprise knowledge graphs. What failed to transform the internet has found a very productive home in the enterprise. And the timing could not be better: Microsoft, Google, SAP, and others are now explicitly putting knowledge graphs at the center of their AI strategies. And also Gartner has finally placed knowledge graphs at the plateau of productivity on the Hype Cycle. This is no longer a niche technology. It is a foundational one.

What knowledge graphs actually do differently

To understand why knowledge graphs matter, it helps to understand what is missing from conventional data exchange. When one organization sends another a file of records, it sends data — but not meaning. The receiving system may know it is looking at customer records, but the deeper context — what definitions were used, what rules govern the data, what relationships exist between entities — is absent. The recipient has to figure it out, usually manually, often inconsistently. This is the fundamental limitation of file-based and API-based data exchange: it moves data, but not knowledge.

Knowledge graphs change this by making data self-describing. A dataset built on semantic web standards carries its own meaning — not as external documentation, but embedded in the data itself. Ontologies and taxonomies describe what the data represents, and that description travels with it.

Consider a simple field labeled “water”. In data from an industrial plant it likely means water consumption. Sent to a government agency, it carries a different meaning entirely. Without shared semantics, both parties have to guess. With a knowledge model in place, the meaning is explicit, machine-readable, and unambiguous. This is the distinction between data exchange and knowledge exchange — the latter adds a layer of context that transforms raw records into something an automated system can actually reason over.

This matters beyond the theoretical. There are many real-world vocabularies like Medical Subject Headings (MeSH), widely used across life sciences, and EuroVoc, the European Union’s multilingual thesaurus for annotating legislative documents. Both of these demonstrate how shared knowledge models make data from different sources instantly comparable and searchable in ways that keyword matching or embedding similarity simply cannot replicate. The knowledge graph does not replace these existing structures — it connects them, and in doing so, unlocks value that has always been latent in the data but impossible to surface.

Why vectors alone are not enough

Vector embeddings have become the default retrieval mechanism in most RAG architectures, and they have genuine strengths — particularly for surfacing semantically related content from large unstructured corpora. But their limitations become significant in knowledge-intensive enterprise queries. When a user types a question into a chatbot, the entire context available to a pure vector retrieval system is that single question. It has no knowledge of the domain the user works in, the regulatory framework they operate under, the process they are following, or the specific entities their question refers to. It finds statistically similar text. That is not the same as finding the right answer.

In high-stakes domains — Financial Services, Life Sciences, Legal, Healthcare — this distinction is not academic. A compliance question requires the system to know which regulations apply, how they are structured, and how they relate to the specific entities in the query. That cannot be randomly chunked up by a vector model. It needs to be explicitly represented in a domain knowledge model that the retrieval system can traverse.

The knowledge graph acts as a context engine, dramatically narrowing the space of relevant content before the LLM gets involved. It grounds its responses in structured, semantically rich data rather than probabilistic pattern-matching over disconnected text. The combination of a knowledge graph and an LLM is far more powerful than either alone. The graph handles precision and context, the LLM handles natural language. Vector retrieval used in isolation delivers neither at the level enterprise use cases demand.

GraphRAG formalizes this combination into a retrieval architecture: instead of surfacing loosely related text chunks, it traverses the knowledge graph to assemble context that reflects actual relationships between entities, concepts, and documents. The result is answers that are not just more accurate but more explainable.

Knowledge graphs and data governance

One of the less-discussed capabilities of knowledge graph infrastructure is its application to data governance and quality. Because RDF functions as a kind of universal mapping language — capable of representing almost any data format in a common structure — it enables cross-silo validation in a way that silo-by-silo governance cannot.

Using SHACL, part of the semantic web standards stack, organizations can define a set of constraints and run them simultaneously across data from multiple source systems. Inconsistencies and missing values that would never surface within a single system become visible when the data is viewed as a unified graph. It is the kind of data quality intervention that organizations have historically needed teams of analysts to approximate manually.

There is also a more targeted application for governance in high-stakes query scenarios. Rather than generating an answer probabilistically and then attempting to cite sources, it is possible to force the system to retrieve its response directly from the knowledge graph. This is achieved by having the LLM translate a natural language question into a SPARQL query, which executes against the graph and returns a result grounded entirely in verified, curated data.

This is a fundamentally different risk profile from a system that generates first and references second. For organizations operating under strict regulatory requirements, the difference is not marginal. It is the difference between a system that can be trusted and one that cannot.

Bridging knowledge graphs and traditional data management

For organizations with significant investments in master data management (MDM), traditional data integration, and relational analytics, the question is not whether to abandon those foundations but how to extend them. The world of rows and columns is not going away. Dashboards, operational reporting, and structured analytics will remain essential. But the knowledge graph sits above all of that as an orchestration and enrichment layer.  It connects the dots between silos, adds domain context, and makes the full data landscape navigable in a way that no single system of record can achieve on its own.

The relationship between MDM and knowledge graphs is particularly worth examining. MDM provides authoritative definitions of shared entities — customer, product, asset — and manages the quality standards that govern them. A knowledge graph extends that foundation by describing the relationships between those entities and connecting them to the broader enterprise context.

For example, not just that Joe Smith and JM Smith are the same person, but what that person’s household looks like, what other entities they are connected to, what transactions, interactions, and relationships exist across the full data landscape. The two capabilities are not in competition. Used together, they get considerably closer to what a genuinely knowledge-driven organization needs — data that is not just accurate, but contextually rich enough to support discovery, not just reporting.

A concrete example illustrates this point. A large pharmaceutical company integrated more than twenty heterogeneous data sources into a virtual semantic layer built on a knowledge graph, without changing any of the underlying systems. The result was that different stakeholders could ask questions in natural language and receive answers that drew on all twenty sources simultaneously. This also included sources they had not previously known existed within their own organization. That kind of cross-silo discovery, at scale, without manual integration effort, is what shifts an organization from analytics-driven to knowledge-driven.

Getting started: use cases, not grand designs

A common concern when knowledge graphs are introduced to organizations is that building one sounds enormously complex and time-consuming. But creating a knowledge graph is not more sophisticated or more time-consuming than data and content management in general. It is a different exercise, and one where automation now does most of the heavy lifting.

The early days of the semantic web, where ontologists spent weeks hand-crafting conceptual models in specialized tools that only specialists could read, are gone. The tooling has matured, the automation has improved, and the process has become considerably more approachable.

The practical starting point is not to graph everything. It is to identify one business-critical use case where the impact of better data connectivity would be demonstrably high. Analytics, search, and generative AI are the three most common entry points, each placing slightly different demands on the knowledge graph but all benefiting from the same underlying semantic infrastructure.

From there, the methodology is consistent: build the domain model, ingest and transform the relevant datasets, link them, test them, and make them available to the target application. Domain experts validate the results, the system iterates, and — if the first use case works — the same methodology extends naturally to additional datasets and departments. What Graphwise calls the five-star journey or five star AI infrastructure begins at zero and, for strategically committed organizations, can reach full maturity within months.

The emerging role of the knowledge steward

The conversation around knowledge graphs and AI typically focuses on what knowledge graphs do for AI — providing context, reducing hallucinations, improving retrieval precision. Less discussed is the inverse: what AI can do for knowledge graphs. The answer, increasingly, is a great deal.

LLM-driven modeling tools can now propose new elements for a knowledge model autonomously — new concepts, synonyms, relationships — with human subject matter experts reviewing and approving rather than authoring from scratch. The quality of these suggestions has reached a point where it materially accelerates the knowledge graph development cycle.

This shift is changing the nature of the ontologist and taxonomist role. The nitty-gritty creation work — new concept, new synonym, new relationship — is increasingly handled by the LLM. What remains irreducibly human is the judgment about what the knowledge graph should represent in order to create business value. Things like scoping decisions, prioritization, and the domain expertise needed to evaluate whether a machine-generated suggestion is accurate and useful in context.

The role is evolving from knowledge engineer to knowledge steward — someone who understands both the business and the knowledge model well enough to guide the system toward the areas where enrichment will have the greatest impact. Far from making the role obsolete, AI is making it more strategic.

The context imperative

Context has always been important but it’s becoming increasingly essential. It’s what separates a file from knowledge. It’s what makes a governance rule apply in one situation and not another. It’s what decides if an AI system can be trusted or merely tolerated.

Organizations that invest in building genuine semantic infrastructure — domain knowledge models, enterprise knowledge graphs, standards-based interoperability — are not just improving their data management. They are building the contextual foundation that will determine how much value they can extract from AI over the next decade. They are creating something very valuable: a data landscape that understands itself.

Getting Started with Knowledge Graphs

  1. Identify a business-critical use case: Pinpoint one high-impact area where better data connectivity solves a real problem — whether that’s a generative AI scenario requiring accuracy and context, siloed search and discovery, or analytics that spans multiple source systems.
  2. Define and build the domain model: Work with domain experts to build the core ontologies and taxonomies that describe the meaning of the relevant data. LLM-driven modeling tools now materially accelerate this step, shifting the focus to human validation.
  3. Ingest, transform, and link relevant datasets: Pull data from the identified silos, map it to the common semantic structure, and link entities across datasets to create a unified view.
  4. Test and integrate: Validate the knowledge graph with domain experts, then integrate it into the target application.
  5. Establish the knowledge steward role: Begin the proactive shift from knowledge engineer to knowledge steward — someone who guides the system toward where semantic enrichment will have the greatest business impact.

Want to learn more about the semantic layer and how it can help you build genuine semantic infrastructure?