Large-Scale Data Lake

A first step towards user-friendly advanced analytics driving global-scale business value with unlimited types of data

Our client, a global FMCG leader, was facing many challenges getting useful insights from data that was already available. Running analytics on the data often required business users to access it via multiple systems. Doing so was time-consuming, with business users spending hours on simple data crunching instead of focusing on making day-to-day decisions. The client wanted to get more value from this data faster, and build a greater competitive advantage through innovative advanced analytics enabled by technology infrastructure.

Client Challenges

Inability to get useful insights quickly and use advanced analytics for both day-to-day and strategic decisions

The existing solution’s siloed data delivery and analytical applications’ performance limitations resulted in insufficient shadow BI without a true high-level overview of the company’s business performance. Additionally, the client was often unable to fully utilize new or even existing data sources because of issues connecting them quickly and efficiently to the established IT ecosystem for data aggregation.

From the business perspective, the main requirement was to build a solution which would allow users to quickly make decisions based on multiple harmonized data sources combined in one place, available via a unified interface or toolset. From a technical point of view, a scalable cloud-based and platform-agnostic solution was required. One of the key requirements was enabling key business users to upload and edit the master data via an easy-to-use interface.

Lingaro, having already established trust with the client, was chosen to advise on, design, and implement the global solution’s most technically challenging modules: its ETL framework and Master Data Management tool.

Large-Scale Data Lake

Lingaro solution

A Data Lake providing end users with new data analysis capabilities

The Lingaro team took part in the entire project delivery process, starting with design workshops to help determine which patterns and solutions would best match the client’s data to business needs.

As a solution, Lingaro proposed a Data Lake, a data processing platform providing business analysts, data scientists, as well as decision-makers with new, advanced data analysis capabilities. Deployed on the Microsoft Azure cloud, the tailored solution offers high-performance in-memory processing. HDInsight clusters and other cloud capabilities recommended by Lingaro achieve scalability, flexibility and performance difficult to attain with on-premise solutions.

As an additional benefit of the project analysis, the client received – for the first time ever – detailed documentation offering a clear overview of the organization’s data processing logic, major data flows, and the business logic behind them.

This project involved the largest global-scale Azure deployment to date. Lingaro’s pioneering Azure specialists worked with Microsoft to consult on functionalities and suggest further improvements supporting large-scale data solution efficiency with Azure.

The Core Data Lake solution enabled:

  • A central repository for all the company’s data assets, including crucial ones regarding POS, shipment analytics, and store execution measurement.
  • Improved Master Data Management quality with tiers allowing local master data control with comprehensive global reporting.
  • Uploading and modifying local reference data for key business users via a web interface.
  • Reference data harmonization expanding data analysis capabilities across internally and externally produced data sets.
  • Accelerated creation of data marts and BI applications.
  • Scalability, flexibility and great performance with a technology-agnostic platform.

High-level architecture of data lake

High-level architecture of data lake

The Data Lake was built using Azure components: BLOB Storage, HDInsights (Spark), Azure SQL DB, SQL Data Warehouse, Azure Data Factory v2, Azure Analysis Services, Power BI.

Business benefits

Crucial data and KPIs from different sources easily accessible in one view support faster, more advanced analysis and trends investigation, better decisions, and quicker reactions to changing market trends

The Core Data Lake provides a unified data processing platform that enables fast creation of new BI applications and is the first step towards truly advanced analytics and global-scale data science. It enables the creation of tailored feeds of filtered and secured quality data for both regional data hubs and downstream applications.

Smooth data updates allow users to see KPIs that are crucial for decision-making in one view – even if they are from different sources. For a business user, data reporting, extraction and analysis are all possible with one tool. Depending on the BI tool chosen for the presentation layer, up-to-date reports can be accessed anywhere via various devices like PCs, smartphones, and tablets. That allows them to make more accurate decisions faster and react quickly when necessary.

The data ecosystem’s greater simplicity and the platform’s improved scalability also led to long-term cost savings in terms of infrastructure/hardware and their support. Over 100,000 data processing jobs were executed successfully on the platform in the first year of its operation.

Currently, the solution enables the smooth handling of more than 25TB of data for 200+ users. In the future, it is planned to have over 12,000 business answering their business questions and streamlining their day-to-day activities in a simple, one-stop portal.


Here is the full story around how we can help to make better decisions based on business analytics every day.

Get professional advice from Lingaro on the BI solutions that best fit your business needs.