Data Dictionary: the what, why and how
This article is rated as:
Technological advances have resulted in the collection of large amounts of data and the availability of data continues to skyrocket. To give you a perspective, more data has been collected in the past two years (2019 & 2020) than the entire human history before that.
This post isn’t about big data – we know there are more than enough articles about big data out there. Here, we’ll focus on how evaluators can (and should) clarify details about the data being used for evaluation. In other words, how and why build an evaluation-specific data dictionary.
What is a Data Dictionary?
Definitions of “data dictionary” vary but it is generally understood to be a common language for quantitative data. Data dictionaries provide a precise vocabulary for specific data elements and help to standardize a dataset and ensure that the relevance, and quality of data elements, are the same for all users. Data dictionaries describe the meaning and purpose of data elements within the context of a project and provide guidance on interpretation.
Why Use a Data Dictionary?
It is ideal to have a data dictionary whenever you have quantitative data that will be used and shared by multiple people or groups. Without precise definitions, it is very easy to arrive at different results while using the same dataset. Confusions can be avoided by documenting data definitions and parameters and sharing them with all stakeholders.
Although creating a data dictionary is time-consuming, having precise documentation that can be used by all stakeholders promotes efficiency. Look at the following example for an online health education program:
The program team defines the “number of program participants per week” as the total number of participants that completed the online module per week.
The IT team defines the “number of program participants per week” as the total number of participants that accessed the online module per week.
As evaluators, if we didn’t examine the definition of “number of program participants per week” meant, we might draw some incorrect conclusions, and risk making unreliable or even dangerous recommendations.
A data dictionary that is prepared collaboratively between the evaluator and stakeholders can prevent confusion and promote alignment. In short, data dictionaries can:
Provide consistency in the collection and use of data across multiple users;
Make data analysis easier;
Promote usability of data; and
Increase confidence in the data, results, and decisions.
How to Prepare a Data Dictionary
Before embarking on the task of creating a data dictionary, ask the program team if there’s an existing data dictionary for the dataset. It is a common practice to share a dataset with the data dictionary if there is one. However, the project team/client might not think to share the data dictionary with you. If there is a common, vetted, and documented data dictionary, it may not be necessary to create a new one.
The built-in active data dictionary can be used in most data management systems including MS Access and SPSS to generate documents as needed. Below is an image of a simple SPSS codebook output.
Alternatively, if your data set is in MS Excel, you can use MS Excel or Word for documentation. Creating and managing a data dictionary is an iterative process; the definitions for the data dictionary categories and the relationships need to be revised regularly. Often data dictionaries in program evaluation contain the following:
A list of data objects: names, metrics (measurement units) and definitions;
Inclusion and exclusion criteria: specify cases to be included or excluded;
Data Source(s): specify the source of data;
Data Update: state how frequently the data is updated and available (e.g., weekly, monthly, annually);
Limitations: specify any considerations that would impact the use of the indicator (can comment on reliability and validity of the data and include any other detail);
Missing data: state if there are any missing values and how they were handled;
Technical notes: provide technical details which help interpret the data presented; and
Approval and sign-off: a data dictionary should be created collaboratively with approval from all those that will use the dictionary. After revisions and edits, and it should be signed off by the team leads to finalize the document.
In summary, a data dictionary is a great evaluation tool for projects with quantitative data. A data dictionary is time-consuming to prepare; however, it can promote efficiency and accuracy in the long run. Try building one for your next evaluation project.