In theory, the advantages of a modern Data Lake over a legacy data warehouse are quite obvious. In practice, they are not. Many businesses’ data warehouses have been proven to work with their existing analytics workflows. The promise of improving these workflows with a new strategic approach is met with strong skepticism and detailed questions.
Fortunately, we have answers. We have years of cutting-edge Data Lake experience that includes global scale data solutions based on Microsoft’s Azure cloud platform.
In this blog series, we have compiled our answers to some of the most common questions we have encountered along the way. We hope you will find them useful, as you explore your own Data Lake opportunities.
“Will the Data Lake stay flexible while moving forward? With time, I’d expect the same problems with complexity that the traditional data warehouse approach brings”.
According to Talend, “A Data Lake is a pool of raw data, the purpose for which is not yet defined. A data warehouse is a repository for structured, filtered data that has already been processed for a specific purpose”.
Therefore, before building a data warehouse you obviously need to decide what its purpose will be. You need to identify all relevant business challenges and address them upfront. A new business challenge will require a rebuild. Often such a rebuild will be costly and time-consuming, especially when regression and non-negative impact testing are involved. To avoid such complicated scenarios, you need to be able to accurately predict the future.
On the other hand, when building a Data Lake, you do not need a crystal ball, so to speak. It is optional to structure data going in. You might choose to do so, for example, if the data will be widely used across the organization for a variety of purposes. When you store “all” your raw, unprocessed data in a Data Lake, your data scientists can quickly access and extract this data to gather additional information that may form the basis of new solutions or solve future business challenges. For example, business units might use the central data container to create more targeted reporting layers known as data hubs.
The Data Lake approach allows you to deconstruct a set of business challenges into smaller, more defined issues. Working on them individually is easier and more efficient – especially if you decide to build reusable technical frameworks to address initial technical challenges before moving on to focus on business-related aspects.