Data Mesh is an operational framework and decentralized data architecture that shifts an organization’s focus from a central data team to domain-centric ownership of data. It is a blend of organizational structures, technology-agnostic principles, and implementation practices.
Its main goal is to transfer ownership to Data Product Owners within functional business domains, reshaping how data initiatives are managed to better serve data consumers.
Why do organizations need a data mesh?
Organizations adopt data mesh to address significant pain points in traditional centralized data architectures:
- Scaling challenges: Data mesh promotes data autonomy to solve issues with scaling enterprise infrastructure for diverse data types and ensuring data veracity, enabling greater data velocity through improved data governance and access aligned with business requirements
- Lack of domain-centric ownership: Data mesh distributes data ownership to align data understanding with business context, helping organizations leverage data as a strategic asset and fostering a data-driven mindset
- Coordination bottlenecks: Effective coordination and communication across cross-functional teams present yet another challenge and can often lead to failures as the gap between data and business widens. By granting organizational autonomy to teams, data mesh eliminates central bottlenecks and delivers value from data.
Figure 1 shows the overall idea of a data mesh with the major components.
Mesh Basics
According to Zhamak Dehghani, the innovator behind this paradigm, the fundamental principles of data mesh include:
- Data as a product: Business units take ownership and treat their data as a high-quality, consumable product in its own right
- Domain-driven ownership: Data ownership is decentralized to the functional business domains (e.g., Sales, Inventory, Logistics). This allows domain teams to discover, explore, create, and enrich new data sources based on specific use cases, all while maintaining centralized governance for security and privacy.
- Self-serve data platforms: Domain teams are provided with self-service platform capabilities to simplify the process of creating and consuming data products, reducing dependence on a central, specialized team.
- Federated trust and computational platform: This creates an ecosystem where users derive value by aggregating and correlating independent data products. This is possible because the data mesh is based on the bedrock of interoperability standards.
To better understand the principles of data mesh better, and how to best enable it, let’s first discuss the key components including domain, data product, data contracts, and data sharing.
What is a domain?
The term “domain” refers to a logical grouping of organizational units that collectively serve a specific functional context. Domains are represented by a node (like an Operational Data Store, a data warehouse, or a data lake) customized to meet the specific requirements of the domain. They can ingest operational data, create data products, and publish them.
Data mesh emerges when teams use data products from other domains and the domains communicate with others in a governed manner.
What is a data product?
A data product is the core asset on the mesh. It is a node that encapsulates code, data, metadata, and infrastructure. It is created and curated by the domain team and offered as a reliable, self-service source for sharing data across the organization.
Figure 2 shows the concept of a data product.
What is a data contract?
A data contract is a set of specifications that define the data product’s interface compatibility, terms of service, and an SLA (Service Level Agreement). Its objective is to establish transparency for data use, dependencies, and required data quality.
However, implementing this requires a cultural shift, and users need time to familiarize themselves and understand the importance of data ownership. Data contracts should also include information schema, semantics, and lineage.
What is data sharing?
Data sharing is a mechanism that allows domain teams to connect and share data products without the need for duplication, keeping ownership within the source domain. Ideally, data should not be copied, which reduces the generation of isolated data repositories and keeps ownership within domain ownership. This requires a centralized data governance approach, often facilitated through metadata linking.
What is team structure?
Team and organizational structure are important aspects to consider when considering the adoption of data mesh. It is typical to organize teams around selected domains rather than have a centralized team.
Domain teams are responsible for all processes – data collection, transformations, cleaning, enrichment, and modeling. Within a domain, teams are arranged vertically and consist of roles required to deliver data such as DataOps engineers, Data Engineers, Data Scientists, Data Analysts, and Domain Experts.
Knowledge graphs and data mesh
The foundational principles of knowledge graphs, driven by semantics and context, position them as an ideal support system for enterprise data mesh and data fabric based development. Knowledge graphs offer the means to ensure that data contracts are standardized, uniform, consistent, semantically correct, and aligned with datasets. They empower data-sharing platforms to connect data between users, systems, and applications consistently and unambiguously. This facilitates compliance with data contracts, guaranteeing data types, schema, entities, and their inter-relationships across data products are semantically valid.
For domain-centric and enterprise data catalogs, leveraging a knowledge graph to store semantics with metadata is highly beneficial. Knowledge graphs help in automatic metadata extraction, generation and enforcement of data quality standards, and certifying data assets based on semantic rules and validation criteria.
By integrating knowledge graphs with data mesh, a semantic data mesh can emerge. This can provide data across different domains in the mesh with context and meaning. It fosters semantic data discoverability, interoperability, augmentation, and enrichment and provides explainability for AI and machine learning use cases.
Conclusion
Data mesh is most of all a shift in culture, processes, and people, and these facets take time to change in larger organizations. This concept might not be easy to embrace and implement as a data architecture. Some organizations choose to focus on specific aspects of data mesh or implement a simplified version of the architecture.
For organizations, considering data mesh, this is not a yes or no decision. Instead, it should be an exercise of identifying the obstacles that hinder them from delivering business value in a timely and efficient manner. The ability to connect, share, and access data effectively across the enterprise is a very likely answer and is one that a data mesh supported by knowledge graphs is set up to deliver.
Want to learn now to connect, share, and access data effectively across the enterprise?