Is data the oil of the 21st century? Like black gold, data is a source of energy and its transformation allows companies to move forward into the future. Beware, however, of the risk of an oil spill. While the need to use data is now recognized by the vast majority of companies, many are struggling to develop real-value applications. Among the reasons for these failures, the quality of the data used plays a prominent role. In this article, we give you the essential keys to improving the monitoring of data quality in business.
Different parameters to assess the quality of the data
At the start of any data processing system there is... data. Attempting to derive information from poor quality data is like constructing a building without a foundation. At best you will have some cracks, but the risk of the building collapsing is very real. In any case, this experience is to be avoided.
The quality of the data is measured in different ways and several criteria must be taken into account when assessing this parameter:
- Accuracy: Is the data correct?
- Completeness: Do I have all the relevant data?
- Accessibility: Is the data available when I need it?
- Accuracy: Is my data accurate, what is the margin of error (for example in the context of thermal sensor)?
- Integrity: Is the data authentic and has it not been altered?
- Relevance: Is this data useful for getting the information I need?
- Reliability: Does the data set meet the defined quality criteria?
- Consistency: Is my data comparable (e.g. using the same metric systems)?
In practice, these criteria vary from company to company and standards may differ. The importance here is therefore not to follow a particular receipt literally, but rather to ensure that data quality guidelines exist and are defined very clearly within the company.
The need for a proper strategy and governance
Monitoring data quality requires recognition of the importance of the data governance function. Too many companies continue to neglect data management and processing, considering that these are the attributes of the IT department. For the same price, you can ask your tax lawyer to defend you if you are charged with murder. Your chances of success will be reduced...
To ensure the quality of your data, you need robust data governance. This means allocating the necessary resources within your organization. If possible, the appointment of a Chief Data Officer often pays off.
In addition, it is important to define a Data Quality Policy very clearly. While the over-abundance of guidelines of all kinds seriously undermines the effectiveness of an organization, some reference documents are essential. This is the case with an official document on data quality. In particular, it is important that data standards be known. The terms used must be clearly defined to be consistent among the different data sources. The forms of data presentation must also be known and harmonized. Errors in data processing models are too often the result of the data itself, not the model.
The support of the various players within the company will be through the development of a data driven culture. Stakeholders need to understand the added value of the information they collect and provide.
According to the consulting firm BCG, a good strategy to ensure data quality is based on differents areas including:
- Data structure: how data is classified and organized according to lexicology and appropriate units of measurements. Different uses of the same term are often observed within the different departments of an organization. Let's take the example of a wine producer; the finance department would measure sales based on the number of bottles sold, while the marketing department would evaluate them in terms of cases sold. Such a lack of communication may seem aberrant but is extremely common in all its banality;
- The existence of guidelines and procedures that define actions, roles, budget allocations or any other data governance directive. As mentioned above, however, it is important to avoid burying the organization under the paperwork and to simply codification to the essential terms based on what data quality means to the company. The document must clearly define how the quality of data is assessed and who is responsible.
Basic rules to ensure data quality
Beyond the implementation of clearly established standards as explained above, it is important to automatically detect or eliminate data that does not meet pre-defined criteria. These constraints may be related to the format (for example, the data must contain between three and five variables) or simply for the result (for example the data must be a text character, a positive number, etc.). The definition of such validation rules allows for the first sorting and separating the good grain from the chaff at the beginning of the data processing process. It's a tedious but essential step. As the well-known saying of the data specialists says: garbage in, garbage out. Try to make a great vintage with a bad grape and you will have a hell of a headache.
At Ryax, we offer a data processing automation platform that integrates certain parameters to assess the quality of your data. Our product complements the company's internal procedures to ensure optimal data quality to extract its full potential. Contact us if you would like a demonstration of our software or discuss its added value for your business.
The Ryax Team.