Quick Overview Of Data Mesh - Why, What & How
Eldar Terzic - CEO
Data Mash, Data, Data Analytics, Data Platform
January 6th, 2023
The Topic Overview
Domain-Driven Design (DDD) comes from software engineering, where the goal is to focus on building a simple, business-oriented language, skipping technical names or concepts. The key concept in DDD is “ubiquitous language”. In this way, we are focusing on modeling software to match a domain according to input from that domain's experts.
Under domain-driven design, the structure and language of software code (class names, class methods, class variables etc.) should match the business domain. In DDD, we want our software solutions to be well designed for the business domain problem. So with the same mindset and thinking we want to apply this to our data. Treat data as a product, not like something else. By driving data product thinking and applying domain driven design to data, you can unlock significant value from it. Data needs to be owned by those who know it best, we call them domain experts.
Why Data Mesh
In simple words, data mesh enables fast, secure, and compliant access and management of data from multiple disparate sources. Using the data mesh architectural paradigm, organizations can access data from disparate sources in real-time, track its usage, and easily monitor data compliance by shifting the responsibility to create, maintain, and monitor data products. A data mesh also helps to reduce the cost of data storage and provides centralized control over data governance. Furthermore, it can also facilitate improved analytics capabilities, allowing organizations to process data faster and generate more accurate insights.
To derive value from your data, there are two things companies must have:
1) Distributed data ownership
2) Distributed data architecture / technology
What Is Data Mesh
Much in the same way that software engineering teams shifted from monolithic applications to microservice architectures, the data mesh is in many ways, the data platform version of microservices. It is a type of architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design. Adapting Eric Evans's domain-driven design model, a flexible, scalable approach to software development that matches your code's structure and language to its corresponding business domain. As a result, we can view all the data we have, simplify the number of transformations, and make both source-oriented and consumer-oriented datasets available to all our data scientists across the organization.
Data Mesh is founded in four fundamental principles that any data mesh implementation must follow to to achieve this objective:
1) Domain-oriented decentralized data ownership and architecture
2) Data as a product
3) Self-serve data infrastructure as a platform
4) Federated computational governance
Domain-oriented decentralized data ownership and architecture
A domain-oriented, decentralized data ownership and architecture principle mandates that domain teams be accountable for their data, as they are in charge of it. As with team boundaries, data must be classified by domains, just as the system's bounded context must be matched by team boundaries. The domain-driven distributed architecture is based on the concept of moving ownership of analytical and operational data from the central data team to the domain teams.
Data as a product
The concept of data as a product allows you to apply a product thinking philosophy to analytical data. As a result of this principle, there are consumers of data outside of the domain. The domain team is responsible for satisfying the needs of other domains by providing high-quality data. In a nutshell, domain data should be treated in the same way as any other public API. Additionally, there needs to be documentation and an easy way for the users to discover and consume data.
Self-serve data infrastructure as a platform
The goal of the self-serve data infrastructure as a platform is to apply platform thinking to data infrastructure. A dedicated data platform team provides domain-agnostic functionality, tools, and systems to build, execute, and maintain interoperable data products for all domains. With its platform, the data platform team enables domain teams to seamlessly consume and create data products.
Federated computational governance
The federated computational governance principle achieves interoperability of all data products through standardization, which is promoted through the whole data mesh by the governance guild. Federated governance is aimed at creating a data ecosystem that adheres to the rules of the organization and industry regulations.
How to Implement Data Mesh
Based on a decentralized design pattern, a real-time data product platform is the optimal implementation of a data mesh architecture that enables domain teams to perform cross-domain data analysis on their own.
A data product platform creates and delivers data products of connected data from disparate sources to provide a real-time and holistic view of the business for operational and analytical workloads.
A real-time data product platform creates the semantic definition of the various data products that are relevant to the business. Data ingestion methods are set up, as well as central governance policies to protect and secure data in data products in accordance with regulations.
Additional platform nodes are deployed in alignment with the business domains. By offering local control of data services and pipelines, the domains have the ability to access and govern the data products for their respective data consumers.
In Data Mesh architecture, we want to move away from the traditional old-style mindset of an understaffed overwhelmed data team trying to collect and understand all types of data from different departments, applications, and domains towards numerous types of consumers, such as analysts, data scientists, report writers, and executive teams. And shift the responsibilities of data closer to the business value stream. This enables faster data-driven decisions and reduces barriers to data-centric innovations.