I.3.05 Sampling Plan

Evaluation Planning – 3.05 Sampling Plan

After the evaluation questions have been identified, the working group needs to describe the source of the evaluation data. Sampling is the process of selecting units (e.g., a subset of people, things, documents, events, organizations, or groups) from a population (the entire set of people, things, events, documents , organizations, or groups) of interest so that by studying the sample we may fairly generalize our results back to the population from which they were chosen. The Evaluation Champion should be familiar with at least the general ideas behind sampling, including external validity issues, and the distinction between nonprobability and probability sampling. There are many resources available on sampling, and while these topics will be covered in brief here this discussion should be supplemented with external resources, and other sources such as http://www.socialresearchmethods.net/kb/sampling.php. Sampling will also be affected by measurement and evaluation design, so once again these steps should occur in a parallel, dynamic and interactive process rather than in sequential fashion.

Key Concepts in Sampling

Unit of Analysis: In evaluation, sometimes we focus on individuals and sometimes we focus on groups. The level on which the evaluation is focused (e.g., individuals, families, classrooms, schools, etc.) is called the unit of analysis and will depend on the focus of the evaluation question. It’s essential that the selection of the unit of analysis is done consciously because the unit of analysis selected for data collection must be the same as what we use to draw conclusions. For example, imagine that an evaluator collects data from individual adolescents on the amount of time they spend engaged in the after-school program and on their risk-taking behaviors. The evaluator analyzes the relationship between after-school program involvement and risk-taking to see if adolescents who are more engaged in the after-school program have lower levels of risk-taking. The data describes the individuals, the conclusions drawn are about the individuals, and the individuals are the unit of analysis.

In some evaluations, groups are the unit of analysis but data are collected from individuals. In other words, the unit of analysis may not be the same as the unit of observation (the cases about which measures are actually obtained in a sample). For example, imagine that an evaluator hypothesizes that classes that use the Jolly Phonics reading program will have higher reading proficiency scores than classes that do not use the program. Reading proficiency is measured by giving each of the individual students a test. However, the individual student test scores are averaged together to create a classroom average score for each classroom. Then, classrooms that use Jolly Phonics are compared to those that do not use Jolly Phonics. It is the differences in the classroom average scores that are used to explain variation in reading proficiency between classes. In this example, the unit of observation and the unit of analysis are not the same. The unit of observation was the individual students and the unit of analysis was the classrooms. In the previous example (after-school program participation and risk-taking) the unit of observation and the unit of analysis were the same (individuals).

Generalizability: You will want to begin by identifying who or what you want to be able to say something about. Imagine that we are interested in evaluating “no tolerance” drug policies in high schools in the United States. Ultimately, we want to be able to say something about all high schools in the United States so our population of interest is all US high schools. Next, we need to identify our sampling frame. This is a list of all elements in the population. In our example, the sampling frame would be a list of all high schools in the United States. The sample (the subset of the population) is drawn from the sampling frame. Evaluators generalize from samples to populations if the sample is representative of the population. Depending on the sampling technique used, we can be more or less confident in the representativeness of the sample (see sampling strategies section below). In some circumstances it may be feasible to avoid the issue of generalizability by conducting a census (studying the entire population of interest). For example, if you are only interested in generalizing the results of your evaluation to the actual participants in your program, you could conduct a census by including all program participants in your sample.

Sampling Strategies: Remember that our primary objective when selecting a sample is to try and make it representative of the population to which we are interested in generalizing our results. If we want to be able to say something about all participants in the program, but we cannot actually study all program participants, we want to make sure that the sample is representative of all program participants. There are 2 major sampling strategies: Probability and nonprobability sampling techniques. In general, probability sampling allows us to be most sure that our sample is representative of the population. Probability methods rely on a random selection method so that the probability of being selected for the sample is known. Nonprobability methods do not rely on random selection and the probability of being selected for the sample is unknown. A few common probability and non-probability sampling approaches are reviewed here, but the reader is encouraged to explore outside sources for additional information.

Probability Sampling Strategies: Simple Random Sampling is a technique that gives every element in the sampling frame the same probability of being selected for the sample. For example, if I wanted to draw a simple random sample from the population of program participants, I might assign all participants a number and randomly select some subset of participants for the sample using a random number generator (e.g., using a random number function in Excel). Many populations are made up of clusters within hierarchies. For example, the individuals who make up the population of 3rd graders are clustered within schools. Cluster Random Sampling makes use of these clusters to aid in sampling. First, the evaluator can randomly select the clusters and then, from within the selected clusters, randomly select the sample. Note that in order to be truly representative of the population, cluster random sampling requires that the process of selection must be random at each stage of selection. A non-probability approach to cluster sampling can be used (see section on Hierarchies below), however the results are not as broadly generalizable.

Non-probability Sampling Strategies: Convenience Sampling is a technique whereby the sample is selected based on convenience and ease of access rather than based on representativeness. Convenience sampling is appropriate for early lifecycle evaluations where the goal is not to achieve generalizability beyond the participants included in the evaluation. Purposive Sampling is a technique in which the sample is selected deliberately (though not randomly) because the participants have some very specific characteristic of interest. This approach makes the most sense when the evaluator has a great deal of knowledge about the population of interest. Purposive sampling does not produce a sample that represents some larger population but it can be exactly what is needed for earlier lifecycle evaluations where the interest is less in generalizability than in getting initial evidence about how the program performs with a specific group.

Determining the Sample

The working group should consider: “Who will participate in the evaluation?” Guide the working group to focus exclusively on who or what will answer the evaluation question(s) and can be measured. Do not fall into the trap of broadly describing the population served by the program. Focus specifically on the population and sample that is relevant for the evaluation question(s). For instance, imagine that there is a program for mothers of premature babies. The evaluation question is “does the program improve the height and weight of the babies?” The primary measure for this evaluation question is the height and weight of the babies at the end of the program. The sample should describe the babies since they are the focus for this evaluation question (not the moms).

The program description section should include a rough estimate of the number of participants predicted for the coming year. The sample section should describe whether some or all of the participants will be included in the evaluation (e.g., the % of participants who will be “sampled”). This will allow readers to determine to what degree the results are generalizable to those who were involved in the program. For instance, if the program expects to have 1,000 participants, yet staff only plans to sample 20 of them, they might have a difficult time generalizing the results to all participants.

As with other aspects of the evaluation plan, sampling changes over the life course of a program. Programs in the Initiation lifecycle phase will probably select their sample based on availability and convenience in order to generate rapid feedback. More mature programs that are trying to make stronger assertions based on their evaluation will have to more formally address internal validity issues and generalizability (and may therefore need to use a probability sampling technique).

 Hierarchies: One issue to keep in mind when sampling is that there may be hierarchies or multiple levels at which different types of sampling take place even within a single evaluation. For instance, if you are conducting an educational program, you might sample school districts, schools within those, grades within those, classrooms within those and students within those. At each level you might use different approaches. For instance, your choice of which school districts you can work with may be predetermined and entirely opportunistic – you’ll work with whichever district is close and willing to participate. However, you might be able to select different schools or classes within school districts in a systematic way. In this example, you could only generalize the results of the evaluation to the selected school district. As programs mature, sampling plans also tend to mature and become more structured and complex.

When writing the sampling plan for the evaluation, consider each evaluation question and describe in detail the population of interest, who will participate, approximately how big the sample will be, how the sample will be recruited, whether there are multiple levels and/or types of sampling strategies employed, and how participants are selected at each level (i.e., sampling strategy). And, keep in mind that decisions made here affect and are affected by decisions made in other parts of the evaluation plan.

Scroll to Top