II.2.05: Explore and Summarize Data

Evaluation Implementation: 2.05 Explore and Summarize Data

In preparation for analysis, it is helpful to get an overview of the data at a higher level of detail than the reviewing and cleaning step offers. With quantitative data, this can be achieved by calculating descriptive statistics to assess whether planned statistical analyses are appropriate (checking sample size, data quality, distributions.) With qualitative data, this essentially entails taking a first pass at summarizing any patterns that are emerging through the coding exercises.

With quantitative data: Descriptive statistics are used to present quantitative descriptions in a manageable form. In a research study we may have lots of measures. Or we may measure a large number of people on any measure. Descriptive statistics help us to describe large amounts of data in a concise way. Each descriptive statistic reduces lots of data into a simpler summary. For instance, consider the Grade Point Average (GPA), a single number that describes the general performance of a student across a potentially wide range of course experiences. Every time you try to describe a large set of observations with a single indicator you run the risk of distorting the original data or losing important detail. The GPA doesn’t tell you whether the student was in difficult courses or easy ones, or whether they were courses in their major field or in other disciplines. Even given these limitations, descriptive statistics provide a powerful summary that may enable comparisons across people or other units.

The most common type of descriptive statistics involve univariate analysis, which refers to the examination across cases of one variable at a time. There are three major characteristics of a single variable that we tend to look at:

the distribution (a summary of the frequency of individual values or ranges of values for a variable, often presented with a histogram or bar chart)
the central tendency (an estimate of the “center” of a distribution of values, using the mean, median, and mode)
the dispersion (the spread of the values around the central tendency, using the range or, when more accurate information is needed, the standard deviation

In most situations, we would describe all three of these characteristics for each of the variables in our study. For more detailed information about these univariate analyses, see the Research Methods Knowledge Base, specifically, http://www.socialresearchmethods.net/kb/statdesc.php.

With qualitative data: Revisit the raw data, reading through any open-ended question responses or interview texts. Where the analysis calls for thematic coding, check how well the identified themes represent the important aspects of the data (relative to your evaluation questions and evaluation purpose.) Look over the categories and subcategories of themes (if applicable), beginning to note any patterns which emerge from the relationships between categories and themes. Where the analysis calls for assigning dimensions towards summarizing and/or comparing processes, as in a case study or multiple case study, check how well each set of data corresponds to identified dimensions. There are various approaches to qualitative data analysis, yet even where themes are mostly selected before coding begins, this iterative process of checking and summarizing the pre-analytic steps will prove helpful regardless of what analysis approach is used going forward.

The process of summarizing the data can provide the foundation for a great deal of insight in its own right. It is also a prerequisite for any further analysis you may wish to do.

Guiding Documents