I.3.06 Evaluation Design

Evaluation Planning – 3.06 Evaluation Design

An evaluation design shows how the evaluation is structured with respect to measurement, administration of the program, sampling and any comparison groups that are included. It provides an important schematic that can be used to guide the choice of data analysis. Simplified general research designs are described below, but selecting a design will vary depending on Evaluation Champion and working group preferences. Once again, we refer you to the literature for more in-depth information on design, including http://www.socialresearchmethods.net/kb/design.php.

Relationship between Designs and Claims

The kinds of claims that you can make based on the results of the evaluation vary depending upon the kind of design you choose to use. For example, if you want to be able to state that participation in the program is related to a change in some outcome, you need to use a design that assesses change. Not all designs are created equal. Some designs are better than others at addressing the kind of claim we want to make. When considering which kind of design to use, it is important to think about what kind of claim you want to make and select a design that can provide evidence for that claim. It is also important to consider the feasibility of the design as well as whether or not it is appropriate given the lifecycle phase of the program. It is possible that after reviewing different design options, the working group may decide to revise the evaluation questions.

In addition to considering the kinds of claims you want to make, it is also important to take note of the kind of language that is used in the evaluation question. For example, if the evaluation question asks whether participation in the program causes outcome X, this implies that a particular type of design that can assess causality is used. The strongest design for assessing a cause/effect relationship is a Randomized Controlled Trial (RCT; a pre-post-test with random assignment to groups). This type of design is considered a Phase 3 (Comparison and Control) Evaluation Lifecycle design and is most appropriate for a Phase 3 (Stability) Program Lifecycle program. On the other hand, when you are doing first-time implementation of a new program an RCT would not be appropriate and you might be advised to choose something like a post-only case study design. The evaluation questions may need to be revised to correspond with the program’s lifecycle phase.

Criteria to Consider when Selecting a Design: There are several criteria that should be considered when selecting a design: (1) Time order, (2) Covariation, (3) Rules out other possible causes, and (4) Shows change. In order to demonstrate time order, we need to use a design that clearly demonstrates that the “cause” or the program happened before the “effect” or the outcome that we are interested in assessing. Covariation means that changes in the “cause” or the program are related to changes in the “effect” or the outcome of interest. In order to demonstrate covariation, we need a design that shows that when the program occurs the outcome of interest occurs and that when the program does not occur the outcome of interest does not occur. Typically, this is demonstrated by using a design that includes at least two groups. One group receives the program (and hopefully exhibits the outcome of interest) and one group does not receive the program (and hopefully does not exhibit the outcome of interest). In order to rule out other possible causes, we need a design that demonstrates that the program (the presumed “cause”) is the only reasonable explanation for the “effect” or outcome of interest. This is typically an extremely difficult criterion to meet. Any number of factors other than the program could “cause” the outcome of interest. In order to demonstrate that change occurred, a design that includes a “before and after” or pre- and post-test is needed.

The strength of the claims we can make depends on how well the design addresses these criteria. The most important thing to consider is alignment. In other words, does the design we select allow us to make the desired claims? The chart below provides examples of some of the more commonly used designs and the associated claims that can typically be made.

Aligning Claims with Designs
Associated Claim	Design where X=program O=observation	Time Order?	Covariation?	Rules out other possible causes?	Shows change?	Program lifecycle phase it may be appropriate for
After program, these participants show desired levels of outcome Z in this setting and context.	X O (post-only)	Yes	No	No	No	IB
According to these participants, in this setting and context, the program is associated with a change on outcome Z.	X O_post/O_pre (retrospective “post- then pre-”)	No	No	No	Yes	IIA
Participation in the program is associated with a change in outcome Z in this setting and context, with these participants.	O X O (simple pre-post)	Yes	No	No	Yes	IIB
The program is effective in this setting and context, with these participants.	O X O O O (pre-post with comparison group)	Yes	Yes	Somewhat	Yes	IIIA
The program is effective in this setting and context, with these participants. It may also be effective in other settings and contexts, with other participants.	R O X O R O O (pre-/post- with random assignment)	Yes	Yes	Mostly	Yes	IIIB

For more information on the criteria described above and designs see:

http://www.socialresearchmethods.net/kb/desdes.php

Design Notation

We often describe a design using a concise notation that enables us to summarize a complex design structure efficiently. If two or more of the same kind of elements function the same way in a design (e.g., all measures are given to all participants at the same time) then a single symbol may be used to represent the entire set; if they function differently (e.g., some measures are pre-post and some are post-only) then you can use subscripts to differentiate them.

Observations or Measures are symbolized by an ‘O’. Distinguish among specific measures, with subscripts, as in O1, O2, and so on.
The Activity or Program is symbolized with an ‘X’. As with observations, use subscripts to distinguish different activities or program variations.
Groups are given their own line in the design structure. Samples are divided into groups that do or do not participate in the activity. If the design notation has three lines, there are three functionally distinct groups in the design. Group type – such as “random” (R), or “non-equivalent” (N) – is designated by a letter at the beginning of each line (i.e., group).
Time moves from left to right.

For example:

O X O Represents a pre-test before and a post-test after the activity

and

N O X O Represents a pre-post group with a non-equivalent comparison
N O O group that didn’t participate in the activity

Notice that the design notation tells something about how the participants are organized or grouped in an evaluation (this relates to sampling) and it shows how measures are sequenced or organized (this relates to measurement). And, the structure of a design will usually circumscribe what will be done in analyzing the data collected. So, design is a fairly central topic in evaluation planning.

As always, it is important to keep the evaluation questions in mind when thinking through the various aspects of evaluation planning. If this is not done, there is the danger of developing a nice evaluation design that doesn’t actually help to answer the focal questions.

Much like the measures section, there are a few key questions to consider once the design has been outlined:

Is there a clear connection between the evaluation questions, chosen measures and the resulting design?
Is the design appropriate given the claims that you would like to be able to make?
Is this design appropriate for this program’s lifecycle?
Is this design feasible given the program resources and organizational capacity?
Is this design feasible given the duration and setting of the program? For example, a short 30-minute activity does not lend itself to an elaborate pre-post measure.

It’s important to link design issues to the lifecycle of the program. As you learned in the Lifecycle Analysis step we believe that the ultimate goal is for the evaluation lifecycle to be aligned with the program lifecycle. Different evaluation designs are more or less appropriate depending on the program lifecycle phase.

When writing the design plan for the evaluation, consider each evaluation question and describe in detail the design type(s) (e.g., post-only, pre-post, pre-post with comparison group, etc.). Make sure the design(s) address each of the evaluation questions, are appropriate given the lifecycle stage of the program, and are appropriate for generating evidence for the desired claims.

Q&A

Q: Where can I learn more about possible design strategies?

To learn more, go to the Introduction to Design section of the Research Methods Knowledge Base:
http://www.socialresearchmethods.net/kb/desintro.htm

Elsewhere on the same website there is much more in-depth information about experimental and quasi-experimental designs.

Q: What determines which evaluation design is the most appropriate for my evaluation? What role do lifecycles play?

Evaluation design is driven by the evaluation question. Evaluation questions, in turn, are strongly influenced by program lifecycle. The wording of the evaluation questions should provide clear guidance on what kind of design will be needed. In particular, pay close attention to what kind of relationships are referenced in the evaluation question. For example, if the evaluation question refers to the extent to which program participation is associated with some kind of change in participants (in their knowledge, awareness, behavior, etc.) then you would need some way to contrast the before and after condition. This could be achieved with a retrospective pre-post, or an actual pre-post design. However, if the evaluation question refers to the extent to which the program has caused a change in participants then a stronger design is needed, in order to control for the impact of non-program factors.

In general, earlier stage programs might choose designs focused on implementation processes or post-only designs, while later stage programs might choose designs involving pre and post measurement with a comparison group. Also, it’s important to consider which option is most likely to give you credible, accurate, and useful data (and, eventually, findings) in a way that is feasible, keeps stakeholders in mind, builds on prior evaluations, and prepares you for subsequent evaluations.

Q: What is the role of outputs in an evaluation?

Outputs play an important role in evaluation. Because outputs are tangible artifacts of activities, connections between activities and outputs can be fertile ground for evaluation questions related to program implementation. For example, in the case of the model airplane program, one might ask “Did participation in the airplane model building workshop lead to the production of finished model airplanes by our participants?” This is essentially a question about the connection between an activity and an output. Answering this question would provide foundational evidence, typically for an early lifecycle program, about whether a program activity is working the way it is intended to.

Q&A

Presentations

Guiding Documents