High-quality data are the precondition for analyzing and using statistical data and for guaranteeing the value of the data. Dr.oec. Svetlana Jesiļevska and Dr. oec. Daina Šķiltere developed the complex methodology for the entire data quality treatment – the Data Quality Scale (DQS). The methodology consists of data quality dimensions and its definitions, indicators for assessment of data quality dimensions and experts’ evaluations. The Data Quality Scale has good expansibility and adaptability as makes it possible to evaluate the quality of data at various levels of detail: at indicators’ level, at the level of dimensions, and to determine the entire quality of data. The Data Quality Scale (DQS) gives an opportunity to identify certain shortcomings of the quality of statistical data and to develop proposals to improve the quality of statistical data. The research results enrich the theoretical scope of statistical data quality and lay a solid foundation for the future by establishing an assessment approach and studying evaluation algorithms.
Authors of the Data Quality Scale have a wide experience in data quality assessment and have a profound background on the data quality. Since 2012, authors have been dealing with data quality issues. Data quality assessment and improvement are topical issues nowadays; authors identified plenty of sources of quality problems of statistical data like data sources, regularity, timeliness of data, updating data, time series, data frequency, data costs etc. Sometimes, the required data do not exist; data from different sources are not always comparable (Šķiltere & Jesiļevska, 2014). It is therefore of vital importance that a complex approach is available to assess the quality of statistical data. The problem here is associated with selecting appropriate criteria to evaluate the goodness of statistical data, therefore, not just related to the research paradigm and intention, but also to the beliefs held by both researchers and research participants (Šķiltere & Jesiļevska, 2014). Based on existing theory, authors developed a system of quality dimensions to determine the quality of statistical data. This systematic approach consists of the following data quality dimensions: data completeness, representativity, objectivity, quality of methodology, coherence, accessibility, accuracy of estimates, actuality, interpretability, statistical disclosure control, optimal use of resources, utility, informativeness. To some of the proposed data quality dimensions not much attention has been paid previously. Authors conducted an expert’s survey to find out the most essential data quality dimensions. The set of data quality dimensions has been tested with experts using four different data usage contexts: data for scientific research, data for decision-making, data for analysis the progress of research object during the reporting period, data for research object modeling and forecasting (Jesiļevska, 2017).
Authors found out the one of the most problematic data quality dimensions is data accuracy. In the scientific literature, many methods have been proposed to identify outliers for empirical distributions, like Dixon Test, Grubbs Tests, Hampel’s Test, Quartile Method, Nalimov Test, Walsh’s Test, Discordance Outlier Test etc. In 2010 Šķiltere D. and Danusēvičs M. developed a method to assess total errors of the truly non−linear trend models. However, no method was available in the scientific literature for identifying outliers by analyzing changes in the indicator under the influence of one or several factors. In 2015 Jesiļevska S. developed Iterative method for reducing the impact of outlying data points. The Iterative method got the 3rd Prize in the 2015 International Competition the IAOS Prize for Young Statisticians and was published in the Statistical Journal of the IAOS: Journal of the International Association for Official Statistics in 2016 (Jesiļevska, 2016).
Based on the previously developed integrated approach to data quality assessment that consists of 13 data quality dimensions and the assessment indicators for each dimensions, in this paper authors present the complex methodology for the entire data quality treatment – the Data Quality Scale (DQS).
During the research, the expert survey of highly qualified specialists responsible for collection, processing and analysis of statistical information was carried out. In the experts’ survey participated 19 experts from National statistical offices representing the following countries: Belgium, Armenia, Cyprus, Finland, Iceland, Czech Republic, Malta, Bulgaria, Romania, Slovak Republic, Ukraine, Lithuania, Belarus, Azerbaijan and Latvia.
Data Quality Scale and the methodology can be used by the statisticians to understand the statistical data quality assessment and the various quality exchanges inside it. We are convinced that the Data Quality Scale will help statisticians to determine shortcomings of the data, to improve data quality significantly, to improve the process of decision-making based on statistical data.
Having at his disposal a methodology of evaluating not only at the data quality dimensions’ level but also the entire statistical data quality, makes possible to use the Data Quality Scale for data form different areas of industry, to make data assessment on dynamics with the purpose to realize the progress in data quality, to find out systematic failures of data collection, processing, validation etc.
To solve data quality problems effectively, both data users and data producers must use sufficient knowledge about solving data quality problems appropriate for their process areas. At minimum, statisticians must know what kind of data, how (methodological issues), and why to collect the data; data users must know what data, how (what kind of analysis), and why (intended purpose) to use the data. In sum, the two main actors mentioned above have roles in a data production process and should cooperate closely to improve statistical data quality. Involvement of both statisticians and data users in the process of identifying and solving possible drawbacks of data opens new avenues for future research and practice.