18 Data Quality Issue Attributes You Should Be Logging
Mar 12, 2019
If you’ve already started or are planning to start a data governance program to support your data quality improvement goals, you need a structured way of tracking your data quality issues and their status. There are different ways of doing this, of course, with either the help of dedicated data quality tools, incident management and ticketing systems, knowledge sharing or intranet platforms such as SharePoint, or even a simple Excel file.
How you track your data quality issues will depend on your organization’s tools and the overall scope of its data quality initiatives. I’ve tracked them many different ways over the course of my career: as data incidents, as part of a status reporting model, through project-based issue tracking, and through a dedicated data quality issues management log. Recently I was having a conversation on the topic at a data quality event, and I was asked: "What is your ultimate guide to a data quality issues log?"
Well, regardless of the tool being used to create this log, here are the data quality issues log attributes I typically include (split into three categories):
1. Issue Details
ID: A unique ID is always essential when putting together any inventory. Reference data quality issues between technical staff, business analysts, and business users more quickly.
Name of issue / Title: Even if this is obvious, recording a short title for your data quality issue is important because it’s usually what business users will reference for a quick summary of the issue.
Detailed description: Any details to offer further context and insight into where the issue was found, what system(s), processes, reports, etc. are known to be affected before an in-depth analysis is done.
Status: Use this field to track how many data quality issues have been identified and submitted, are in progress, or resolved. I recommend using the following options: backlog (initial status of an issue), assigned (when resources have been identified and assigned), in progress, testing, closed/resolved, and on hold.
Date raised / Date added: This date field helps you keep track of when data issues are submitted, which can help you identify how long an issue remains unresolved.
Target resolution date: Use this date field to track when the issue needs to be resolved based on any dependencies it might have (ex: another technical project, business process redesign, report deployment, etc.). This date can be a good indicator of the risk status.
Importance: This drop-down field will help you prioritize the issue log items and sometimes determine the target resolution date value. I use the following categories: critical, high, medium, low (though it’s up to you to decide how to define these designations).
Impact: This describes the extent of the issue. How many records are affected? What business areas, processes, information, reports, decisions and so on are impacted by this data quality issue?
Category: This element is dependent on your data governance and data quality models, but try to find a meaningful way to categorize your issues either by data governance areas or data quality measures, or both. Note that one entry can belong to multiple categories. For example, a data quality issue can be categorized under timeliness, accuracy, and no standards. You can always add new subcategories as you go along.
2. Resources & Ownership
Business unit: This is not a mandatory field, but I recommended it because it helps you understand what resources are going into resolving data quality issues owned by a particular business unit. Of course, issues may also be owned at the organization level.
Business owner: Who has the authority to sign-off on issue resolution? Who dictates the business rules to which data quality standards and requirements must conform?
Business analyst: Ideally, this type of resource exists in your organization, as analyst skills are crucial in helping identify business needs, understand technical limitations, and figure out the root cause of a data quality issue.
Technical resource: Person(s) tasked with implementing the technical solution (ex: modifying the metadata, updating the user interface, implementing controls, creating audits, etc.), performing data profiling, cleansing the data, and so on.
Testing lead: Even though the technical resources should always have someone to help them test their work, this field is meant to track the testing resource from the business end, which individuals typically understand the data’s semantics.
3. Final Resolution
Root cause: Determining the root cause will not just help you identify the fix, but also prevent it from happening again. Detail the underlying cause of the issue, which could include lack of ownership, lack of clearly defined standards, insufficient auditing, lack of data validation, technical limitations, lack of training, incorrect or nonexistent definitions, incorrect or ambiguous business process, and more.
Resolution details: What had to be done in order to fix and prevent this data quality issue from happening again? Did you have to create new data, perform a data override, fix a software bug, update user documentation and processes, change a business term definition? Use this as a reference point when similar log entries are being added. Also note whether a given resolution is intended to be permanent or temporary. In the latter case, be sure to outline the process that will be used to find a final resolution.
Completed date: Not all submissions get to be completed by the desired target date. Hopefully they get resolved earlier, but some get resolved later. Calculating the difference between the target date and completed date will yield some interesting measures and might help your case to acquire further resources.
Status report notes: This stands as its own area, but as you progress with your analysis, resolution, and testing, keep track of what has been done in the past week, each week, so that the business owner can refer to it.
A very important aspect to all the fields listed above is consistency. Ensure you’re tracking the dates in the same format, that you have guidelines on when to fill out the status report notes (ideally after each change, summarized by week), that you have a standard on the titles (ex: only capitalize the first word), that you always name your resources in the same way (with first and last name, for example), etc.
As I mentioned before, there are many means of tracking a data quality issues, but these fundamental attributes should be logged to ensure the efficiency and accountability of your data quality management initiatives. What else do you think should be included?
Article originally published on LightsOnData.com.