II.1.06: Train Data Entry and Analysis Staff

Evaluation Implementation – 1.06 Train Data Entry and Analysis Staff

“Data entry” is the process of converting raw information into a format that stores the data uniformly and makes it accessible for analysis. It’s a step that is often taken for granted. The truth is that it takes time and skill, and can make or break the quality of your evaluation. Errors in this step can drastically reduce the content or accuracy of the data you will be analyzing; concerns about data quality can undermine the credibility of your evaluation altogether.

The person doing the data entry does not necessarily have to understand the entire evaluation in order to do a good job. But, they do need to be attentive to detail, follow procedures carefully, be willing to check their own work, and know when to ask questions. Time is valuable and money is often limited, so the costs of having to re-do data entry can become prohibitive quickly. It is critical that the task be done well the first time, that patterns of errors be detected and reversed quickly, and that the end result be checked for accuracy.

The task of entering data may look different in different evaluations. For a qualitative evaluation that relies on recorded interviews, data entry could involve transcribing the interviews and recording data such as date, participants, the actual questions (if it’s an unstructured interview protocol), and the verbatim answers of the person being interviewed. This text might be entered into a word processing or spreadsheet program, or some other software used for qualitative analysis (e.g., NVivo). If the measure is a written survey, data entry would typically take the form of entering responses into a spreadsheet, including any demographic or participant information that was requested along with the responses to the survey items which might be multiple-choice or scaled responses, or could include written responses to open-ended questions.

Instructions and training for data entry personnel should be sure to cover the following, to whatever extent is appropriate for the evaluation in question:

Who is permitted to do the data entry (it may be problematic if there are “many hands” at work on this task)
How to handle the raw original data (to prevent loss of records or contamination)
How to ensure and protect the anonymity and confidentiality of responses, including what they may not discuss with others
How to handle non-responses (if using a spreadsheet, check to see what it requires for “null” answers – some use blank cells, some use an assigned “99” or other code, and so on)
How to handle invalid responses (if it’s a scaled response with integers from 1 to 5, for example, and the respondent wrote in “2 ½” should that be left blank or should it be entered as written?)
What the acceptable abbreviations or acronyms are, so that information is entered consistently
What to do about bad handwriting
What to look out for that would indicate possible problems with the administration of the measure (are there lots of incompletes? Do there appear to be patterns of difference between how it was completed in different sites?)
How to exercise judgment (if that is allowed at all), or perhaps more usefully how to refrain from inserting an interpretation or judgment
How to check for accuracy

To the extent possible, if entering into a spreadsheet or formatted software program, set up the program or database to use pull-down lists or to limit the accepted cell entries in order to reduce operator error.

Depending on the nature of the data entry task, it may be helpful to have data entry staff do some practice data entry – essentially, pilot-testing the data management and data entry procedures. Accuracy in this task is a worthy investment! Consider having data entry staff do some practice data entry. There should also be “spot-checks” done by someone else both early in the data entry process (so that patterns of mistakes can be caught), and then again later (to make sure the dataset is reliable), One way to do this is by pulling a random sample of raw data and checking the accuracy of the entered data.

Guiding Documents