II.2.04 Create Codebook and Code or Categorize the Data

Evaluation Implementation – 2.04 Create Codebook and Code or Categorize the Data

Coding data entails the conversion of information-rich raw data into a shorter form that can be readily aggregated for analysis. Coding is used when an evaluation involves large quantities of numerical data in a spreadsheet, and may also be used to sort chunks of narrative data into categories. Coding is useful in aggregating responses across units of analysis to compute an average effect of the program on participants. (Note that where qualitative data is gathered instead for the purpose of understanding a complete program situation or individual change process, the analysis may call for non-coding approaches.) Assuming your evaluation question calls for information about an average effect, an important step is the creation of a codebook or classification scheme. In the case of data with pre-set response options, the codebook describes the data and indicates where and how it can be accessed as well as how the original raw data corresponds with information entered in a spreadsheet, for example. Minimally, the codebook should include the following items for each variable:

variable name
variable description
variable format (number, data, text)
instrument/method of collection
date collected
respondent or group

The codebook is an indispensable tool for the analysis team. It should provide comprehensive documentation that enables others who might subsequently want to analyze the data to do so without any additional information. Quantitized data coding involves assigning an established number for particular responses. For example, if you ask respondents for their gender, you might code the responses like this: “. = missing, 1 = male, 2 = female.” An example of a codebook is provided below:

Variable name	Variable label or description	Level of measurement	Value labels	Location
PersonID	Participant constructed, unique identifier	n/a	n/a	col.1
ProgSite	Name of site where participant engaged in program	categorical	.=missing 1=campus 2= Extension office 3=high school	col.2
Age	Age in years at last birthday by self-report	continuous	.= missing value = age in years	col.3
PreItem1	Response for pre-test question 1	ordinal	1=strongly disagree 2=disagree 3=agree 4=strongly agree	col. 6

In the case of qualitative data that is coded thematically, by identifying themes—such as beliefs, attitudes, experiences, actions —contained in specific text passages or segments, these themes provide a more manageable form in which qualitative data can be aggregated and compared across people, groups or settings. Where response options are semi-structured or unstructured, it is especially important to track and document the classification scheme used in determining how different chunks of data are assigned to different thematic codes. This includes, for each code, a decision rule about what is “in” and what is “out” of this particular category. If chunks of narrative data are also dimensionalized (for example, by frequency or by intensity of emphasis), carefully document the team’s criteria for what is/is not labeled “frequent” or “intense” in a coding table. Where more than one coder is working with the data, coders should compare their coding decisions, then develop and apply a unified coding scheme before the data is ready for further analysis.