The Importance Of Data Quality: Is Your Data Good?
In the world of data, an old adage rings true - businesses should choose quality over quantity. Gartner estimated poor quality data costs companies an average of $15 million in 2017. Bad data impede an organization from operating effectively by warping its assessment of reality leading to missed opportunities, poor decision making, and financial and legal ramifications. While individually verifying small amounts of data is possible, it is not scalable when 2.5 quintillion bytes of data are created every day. Therefore, in an increasingly data-driven world, it is essential that organizations focus on ensuring the quality of their data.
But, how do you know what data is good and what data is bad? The true measure of data quality is how effectively it represents a particular portion of reality that we want to analyze. Of course, these realities will vary depending on the type and use of the data in question. Some of these measures are variable depending on our use case. One common variable is relevancy. How relevant the data is to our goals, and timeliness and how often the data is updated for our given needs. However, the most important measures of data quality are those that identify how closely and reliably the data represents reality. There are many other metrics that are also important ways to analyze data. Metrics such as accuracy, how well data represents reality, timeliness, how consistent a provider can deliver the data, and whether the data is stable and complete directly measure the proximity of data to reality.
These measures of reality are reflective of how data is collected, organized, and distributed. This all becomes particularly important when purchasing data from external data providers. While there are many tools for data quality maintenance such as anomaly detection systems, it is extremely difficult to gauge the quality of data prior to acquisition. This is because there are conflicting interests between data buyers who want to hide the valuable insights of their data, and sellers, who want to be able to determine the quality of the data before purchase. At Spectre, we recognize the dynamics that are at play when it comes to inter-company data negotiations. There is the need for data sellers to expose the quality of their data to attract and inform buyers without revealing the valuable insights of the data.
Recently many advancements in machine learning and data infrastructure have led to the development of Data Quality Insights, metrics of data quality that quantify the quality of a data product over time without revealing proprietary data insights. Without exposing the underlying data, sellers can improve their scores and buyers can understand how the data reflects their use case. These advancements allow businesses to now make informed decisions prior to purchasing data while also helping data providers secure their products and attract new customers. By ensuring that data quality is a priority, organizations can now improve their overall operations and bottom line.