Overcoming Challenges in Implementing Data Mesh

Data Platforms:

Paweł Mitruś

We all know that silver bullets don’t really exist. Everything has both advantages and disadvantages, and the same is true for Data Mesh implementations.

Through our articles:

we have learned a lot, but it is high time we look at some of the challenges.

To get the most out of our data expertise, we have grouped the challenges that organizations can face into two categories: Organizational and Technical. Taking these into consideration can be crucial in the enterprise transformation process and lead to the expected outcomes in the field of data management.

Organizational

Budget – Let’s start with the hypothetical deal breaker. Will the domains have the money to pay for their infrastructure, to come up with applications, deliver data products and maintain them? Even if the planned calculations look good, there are different threats to look out for. Let’s assume, the Platform Team has delivered a tool that effectively bridges the technical gap. But, as time progresses, volumes get bigger, data products become more complex (as in any other IT project) and it may turn out that they must pay significantly more than they can afford.
The Amount of Work to take on – Keeping in mind that your domain teams are also facing some serious transformation, the Platform Team must invest much, much more effort than it normally will. Even if the domains are internal clients, there are some standards in the way they deliver. After onboarding the domains, any migration that causes breaking changes, could be a nightmare. So, from the very moment of the first release, your Platform Team will have 10% of the flexibility it got when developing a traditional data platform.
Very Strong Central/Organizational Owner – We have repeated this over and over before, Data Mesh implementation is mostly about organizational transformation, rather than technological. It will require very strong organizational support, as it forces many changes that others can be afraid of. Without a strategy, announced action plan, and an increased awareness inside the organization of what we are trying to achieve – it can easily fail.
Convince Domains to Cooperate – As Data Mesh causes a lot of extra work for the domains (previously they mostly consumed reports), you need to convince them somehow that this is worth the effort. Once they are on-board, you still must coordinate serious releases with them. Let’s imagine that you’d like to improve the platform, causing some breaking changes, but one domain is in the middle of testing their new applications. They can hold you off for several months. It might be worth implementing a release calendar, or having a guide for platform-domain team cooperation, RACI, etc.

Technical/Organizational

Master Data Management – This is a serious challenge you will face when designing your Data Mesh implementation, and there is no simple answer on how to do it right. Are you going to decentralize it, having domains managing on their own, or will you rather create a kind of central team that offers integration? The answer really depends on the industry specific business domains (retail, banking, health etc.) and where your data comes from overall.
Technical Skills Shortage – Together with delegating full ownership to the domains, they must be able to commit. They can hire new people, get themselves trained, but it can be too much for them to handle. Ignoring this matter will not cause any visible issues right away. Usually, they will start popping up later when performance dramatically decreases. Nevertheless, no tools will solve this, no matter how many abstraction layers you implement to hide the technical aspects. The knowledge of how things work in data engineering is necessary and cannot be underestimated.

Data Mesh implementation vs Technical

Governance – The more decentralized architecture you get, the more attention you need to pay to the governance area. When running a platform with a central team, a lot of knowledge can remain unwritten, kept within its members. But when following the decentralization approach, you must provide a consistent way of setting up the cloud infrastructure, the standards for data products (to make them browsable and accessible is a well-known way to downstream applications) and many, many more.
Monitoring – Don’t forget that besides supplying the domains with appropriate tools to deliver data products, they also need to maintain and monitor what is going on. Moreover, some might be missing deeper understanding of what exactly all technical metrics mean, how they affect the workload (how to optimize it with the help of these metrics). You probably would like to invest in something extra for Platform Team, to help reacting to overutilization or ineffectiveness. Because in the end even if this is domain’s budget, it’s still your company, right?
Data Virtualization/Duplication – It’s the end of the “gather data into a single place” approach, but eventually you want to combine data from different domains. There are a couple of ways to do this, but let’s consider the two that are most common:
- Data Virtualization – it’s building a semantic data model outside the data sources, that doesn’t physically transfer data to another database. Instead, it breaks down users’ queries, pushes its parts to the sources, and assembles the results back together.
- Data Duplication – if you don’t choose the above approach, then you must handle data transfers from original sources to downstream applications, which leads to data duplication. In some cases, this can increase your bills tremendously, not only due to the storage fees, but also because of the potential data transfer costs (e.g., in the Cloud environment, one needs take into account egress costs).

As every organization is unique, each transformation can reveal something new. That is why the list of challenges can go on with many more examples of obstacles. However, the above bullets are the most common we see and knowing them upfront will help to mitigate the risk.

If you would like to learn more about Data Management Platforms, we invite you to read our articles on Data Fabric and Data Lake Architecture.

Meet Our Experts

Paweł Mitruś Cloud Solution Architect

I have completed my studies at the Warsaw University of Technology, Faculty of Mathematics and Information, and gained my MS degree in Computer Science. I have been working with data processing & modelling for about 8 years. What I value most at work is architecture clarity, applying best practices, and efficient communication. In my free time, I like to develop my soft social skills. I believe they are the key factor in achieving any goal. I am also devoted to running in triathlons, I specialize in the 70.3 ironman distance.