II.2.04 Create Codebook and Code or Categorize the Data

Evaluation Implementation – 2.04   Create Codebook and Code or Categorize the Data

Coding data entails the conversion of information-rich raw data into a shorter form that can be readily aggregated for analysis. Coding is used when an evaluation involves large quantities of numerical data in a spreadsheet, and may also be used to sort chunks of narrative data into categories. Coding is useful in aggregating responses across units of analysis to compute an average effect of the program on participants. (Note that where qualitative data is gathered instead for the purpose of understanding a complete program situation or individual change process, the analysis may call for non-coding approaches.) Assuming your evaluation question calls for information about an average effect, an important step is the creation of a codebook or classification scheme. In the case of data with pre-set response options, the codebook describes the data and indicates where and how it can be accessed as well as how the original raw data corresponds with information entered in a spreadsheet, for example. Minimally, the codebook should include the following items for each variable:

  • variable name
  • variable description
  • variable format (number, data, text)
  • instrument/method of collection
  • date collected
  • respondent or group

The codebook is an indispensable tool for the analysis team. It should provide comprehensive documentation that enables others who might subsequently want to analyze the data to do so without any additional information.  Quantitized data coding involves assigning an established number for particular responses. For example, if you ask respondents for their gender, you might code the responses like this: “. = missing, 1 = male, 2 = female.” An example of a codebook is provided below:

Variable name

Variable label or description

Level of measurement

Value labels

Location

PersonID

Participant constructed, unique identifier

n/a

n/a

col.1

ProgSite

Name of site where participant engaged in program

categorical

.=missing

1=campus

2= Extension office

3=high school

col.2

Age

Age in years at last birthday by self-report

continuous

.= missing

value = age in years

col.3

PreItem1

Response for pre-test question 1

ordinal

1=strongly disagree

2=disagree

3=agree

4=strongly agree

col. 6

In the case of qualitative data that is coded thematically, by identifying themes—such as beliefs, attitudes, experiences, actions —contained in specific text passages or segments, these themes provide a more manageable form in which qualitative data can be aggregated and compared across people, groups or settings. Where response options are semi-structured or unstructured, it is especially important to track and document the classification scheme used in determining how different chunks of data are assigned to different thematic codes. This includes, for each code, a decision rule about what is “in” and what is “out” of this particular category. If chunks of narrative data are also dimensionalized (for example, by frequency or by intensity of emphasis), carefully document the team’s criteria for what is/is not labeled “frequent” or “intense” in a coding table. Where more than one coder is working with the data, coders should compare their coding decisions, then develop and apply a unified coding scheme before the data is ready for further analysis.

Scroll to Top