Data Mesh is a decentralized approach to data architecture that has recently increased in popularity. Within the data mesh approach, data ownership is handed to the domain or business function that understands the structure, purpose, and value of the data best. With this ownership comes the responsibility of each team to make sure that the data is clean, free of errors, and accessible for others.
Centralized vs decentralized data approach
For the past 20 years or so, the prevailing notion around data has been that it is best handled using a centralized data ecosystem. Ever-increasing computing power and decreasing costs for the storage and processing of data, incentivized companies to centralize their data collection.
Most organizations chose to store this centralized data in a data warehouse or data lake where the pipelines and cleaning of the data are maintained by a specialized team of data engineers. This means that the centralized team is responsible for most (if not all) of data-related functions including the extraction, maintenance, cleaning, and loading of data into tools where it can be utilized by business users.
The rise of Data Mesh
Data Mesh offers a fundamentally different answer to the question of data management.
In essence, each business domain should treat its own data as a product that they make available to the rest of the organization. This creates a node-like system where ownership of the data is retained within specific teams, but the output is accessible by other parts of the organization in a mesh-like structure.
In this article, we will take a look at these different approaches to data architecture and what you need to keep in mind when making decisions about which approach to choose in your organization.
Data Mesh vs. centralized architecture
The rising interest in Data Mesh is partly explained by looking at the past, present, and future challenges of a centralized approach (data warehouses or data lakes). In hindsight, the appeal of central data storage seems obvious.
On its surface, it allowed organizations to better handle the increasing volume of data at scale, and it broke down information silos in companies by making data more readily available to users outside specific domains. However, at closer inspection, this solution comes with its own set of challenges.
The proliferation of data
In any company today, the potential availability of data is vast. In marketing alone, the median number of data sources for a B2C company in 2021 was 10, up from 6 just one year prior, and projected to increase to 12 during 2022. Based on this trajectory, it’s likely that number will jump into the twenties in just a few years.
With a fully centralized approach, this creates some obvious headaches for ant central data team. In addition to the task of building and maintaining an ever-growing data pipeline, the raw data output from different platforms is rarely uniform or standardized. Without sufficient domain expertise, it can be very difficult to assess how the data needs to be cleaned and transformed to generate usable output.
The risk of stifling innovation
The increasing availability of data also creates intriguing opportunities to mine for patterns, correlations, or even causative relationships between points of data. This possibility for rapid innovation needs to be nurtured. But with a centralized approach, with one team responsible for the entire company data stack, chances are that any required additions or changes to the data pipeline or transformation layer are blocked by the availability of internal resources.
Outside of the stifling of innovation, this can become a source of organizational frustration, which carries significant long-term risks for the company culture and levels of trust.
On the flip side, a Data Mesh infrastructure can offer teams more autonomy by being able to self-service their data needs. The below image from Scott Brinker, at Chief Martech, highlights the impact of having a self-service approach as opposed to a centralized approach – where in the latter, a lot of time is wasted just waiting for things to be done.
What are the benefits of Data Mesh?
As mentioned at the start of this article, the interest in Data Mesh has grown significantly in response to the frustrations of having tested a centralized or big data architecture.
In a Data Mesh architecture, ownership of a data asset or domain is decentralized and given to the business units most familiar with its structure, purpose, and value. With this ownership comes the responsibility of each team to make sure that the data is clean and free of errors so that other stakeholders can make use of it.
Instead of having a centralized team of data engineers and data scientist responsible for the entire data stack, they are scattered across the organization and work closely together with the domain experts in the respective teams. This will help ensure that the output is usable and understandable by the rest of the organization. With proper implementation, a Data Mesh architecture can help relieve many of the pains of a centralized approach by shortening waiting times, speeding up innovation cycles, increasing organizational trust, and improved output.
With these potential organizational benefits, it’s easy to see why interest in Data Mesh architecture has become a popularized concept in recent years, and it is likely that this trend will continue for years to come.
What are the benefits of Data Mesh for a marketing organization?
The decision to select an overall data architecture is not one that should be taken lightly and will depend on a number of factors.
As we’ve seen in this article, marketing teams that are part of organizations with a centralized data architecture can suffer from long waiting times which in turn slows the innovation pace and test frequency of the team. All the while, the number of data platforms and clouds that marketing teams work with (and want to collect data from) keeps expanding every year.
By moving data ownership closer to the marketing team, you can often increase the velocity of the team and remove several headaches that come along with a monolithic centralized data approach.