Data quality encompasses strategy and business processes that involve verifying that data is relevant, documented, and properly processed. This includes looking at the design and structure of systems that manage data and test its quality. These are direct aspects where technical expertise come into play, but there are subtle, indirect factors that many organizations tend to discount but are equally vital to the project’s success.
Here, we’ll explore these nuances — how data quality also means understanding the business context, recognizing the customer’s needs, and involving everyone in the project’s life cycle.
The approach to managing analytics projects affects data quality. Unlike projects that develop and construct IT systems, no IT systems are created in data and analytics projects. Because of this, there’s a common perception that decisions are often made to simplify and shorten the project cycle. This assumes that it’s not worth investing in something that will not serve its purpose for several years like an IT system apparently would.
Unfortunately, this is a misconception. While it might apparently generate savings (i.e., doing away with functional or technical specifications or certain types of tests), this can degrade data quality. In turn, the predictive models, built on this data, would produce outputs that don’t meet quality expectations.
Another activity that’s often omitted to reduce costs is identifying stakeholders. This is no less important in data and analytics than it is in developing IT solutions. Why? Data is stored in different locations and different systems. Each of those has its owner. There are also steps for processing completed by separate teams. All of them are stakeholders — people or teams who affect or are affected by the project. Omitting any of them in the project life cycle could cause significant issues, including the lack of project implementation. This is why it’s vital to identify who the stakeholders are. It’s best practice to create and maintain a list of stakeholders, plan communication with them, and involve them in the project where and when appropriate.
Another aspect that influences data quality is how roles or responsibilities is outlined in projects. I’ve observed that many large organizations employ what I consider a poor practice: combining roles in data and analytics projects.
Sometimes, there are assumptions that the project’s team members would like to or are expected to deal with both analysis and development or other roles. This is neither practical nor effective:
Each role requires such broad skills that it’s impossible to be a good developer and a good analyst at the same time. Even within the same roles, specializations are already required because the domain areas are so vast.
Two heads are better than one — if an analyst makes a mistake, the analyst will likely not notice it particularly when dealing with the programming of a solution designed by the same analyst. It’s like the same person designing, building, and testing, which is overwhelming. It’s always good to have fresh eyes along with the analyst’s.
The quality of data can also be affected by the quality of final data, which, in turn, depends on the quality of source data. This doesn’t just involve quality. The project would be run more efficiently and less prone to errors in the long term if data quality improvements are started at the source:
Benefits of fixing data at the source (recommended approach): | Consequences of not fixing data at the source but only in final datasets: |
|
|
This might seem less obvious. What does a business analyst have to do with data quality? There’s the project manager, a team of developers, and another team of QA specialists, but why is a business analyst needed? Here’s a summary of why they’re vital to data and analytics projects, and how they contribute to data quality:
Business analysts identify the customer’s needs. After all, quality also depends on what the analytics solution delivers to the customer.
Analytics projects not only require technical expertise. They need to align with overarching needs, which business analysts know. Even the most competent QA specialist would not be able to properly test data quality — and the final analytics solution — without checking if it supports the business’s goals.
Business analysts provide other perspectives on issues that might appear on different stages of data processing or development and communicate them to the customer.
Business analysts define testing scenarios and test cases within a functional specification, which can directly affect data quality. Incorporating well-defined testing scenarios at the early stages of the project helps the customer better understand the solution while helping the team recognize the customer’s expectations about the solution’s quality.
Change management can be difficult for the project's development team, and business analysts provide the acumen in managing the project’s timeliness, budget, and product quality. They’re able to perform impact analysis and discover all dependencies and consequences of a requested change. Without these, even simple changes could destroy quality, which can be tedious to rebuild.
Insights are only as effective as your data. Ensuring its quality, on the other hand, means more than just checking its accuracy or consistency. It’s a combination of efforts grounded on the technologies or platforms that process data, the people who work on these systems, and the expertise to align them to the business’s needs.
Incorrect, inconsistent, and incomplete data has a negative domino effect. When wrongly determined data is used in performing calculations and determining subsequent values, the output and new data are also incorrect.
Repairing data is an extremely difficult and expensive process. Even when fixing issues in IT systems, it’s usually the most difficult part. Fixing an algorithm is relatively easier, for instance, but reverting or correcting what the wrong algorithm produced is more burdensome. Moreover, a fix implemented to correct errors in data might not necessarily resolve all errors. It could even introduce new ones, depending on the complexity of relationships between different kinds of data.
Ensuring high-quality data might be challenging, especially when clear, measurable expectations aren’t communicated across the team and the customer. Even intricately designed models with exceptionally made algorithms will yield unfounded outcomes and insights if they use poor-quality data. High-quality data helps reveal patterns, dependencies, and rules that could occur in various business areas of the organization and empowers more accurate and confident business decisions.
Lingaro Group works with Fortune 500 companies and global brands in adopting a multidimensional approach to standardize quality — from preparing, profiling, cleansing, and mapping to enriching data. Our experts work with decision-makers to advance their organization’s maturity in analytics and ensure that the data they use is accessible, unified, trusted, and secure. Lingaro also provides end-to-end data management solutions and consulting that cover the entire data life cycle and across different operating models, tiers, storages, and architectures.