Data processing

This is a critical stage in the evaluation where the data is put into a format that will ease its analysis. Processing of data is inextricably bound up with its collection. Often the collection methods start the processing and frequently reduce some of the detail in the data; for instance, the categories chosen for a closed question on a form, or for writing notes of an interview. The mechanics of processing of data will vary significantly between quantitative and qualitative methods.

Processing quantitative data

Processes for handling quantitative data are designed to make it easier to count. This can be undertaken manually, but is time consuming and prone to error, so unless the analysis required is very simple it is advisable to use a computer. Most probation service staff now have easy access to a machine, and most machines have the facility to set up a simple spreadsheet or database to process the data. Some services have the facility to scan forms into a database, which can reduce data entry time, effort and error considerably, however it does require that the form has been designed for the system initially and set up so that it will be recognisable on entry. If such a facility is available it is worthwhile spending extra time at this preparation stage to save time later.
If using a computer program for the analysis of data, it is important to be aware of the possibilities and limitations of the program before starting to collect data so that they can be allowed for in data collection. For instance, some programs can use dates in calculations and could use date of birth and date of commencement to calculate age at commencement. Other programs cannot do this and will require the age to be entered directly. Some programs will be able to create age groups from the actual age of a person, whereas others will not. If this is the case a decision must be made about which is most important, or whether to input both the actual age and the age group. Wherever possible collect and process data in as much detail as possible. This gives greater flexibility for the analysis and greater reliability in calculations.
As computer programs become more sophisticated, the traditional distinctions between spreadsheets and databases are breaking down, but do check the capabilities of the precise program and version that you will be using for analysis before starting data collection. Also check out the abilities of different programs to accept and generate data in a format acceptable by other computer programs, as this can enable the user to do different things with different programs should this be necessary. There are also programs written specifically for the analysis of data collected for social surveys. The most frequently used program here is SPSS (Statistical Package for the Social Sciences), which gives great flexibility in working with data after entry as well as data analysis and presentation facilities.
Whatever program is used it will be necessary to set up a data file which will receive the data from the evaluation, and the data will be stored in the same basic matrix format, as illustrated in the Table below.

- Each row represents one case in the study, such as a person or a report, and is referred to as a record.
- Each column represents an item of data collected about that case, such as name or age or date of first attendance, and is referred to as a variable.
- The complete matrix is known as a data set.
This data set demonstrates a range of ways in which data can be entered.

Text: the first column gives the actual name of the person as text, and is usually included for identification rather than analysis.
Date: the second column has date of birth entered in figures to enable subsequent calculation of age.
Alphabetic code: sex is entered as an alphabetic code where M = male and F = female. Check that the programme you are using can perform the required calculations with alpha codes. SPSS, for example, will produce frequency distributions of data with alpha codes, but does not allow many other statistical procedures. In these circumstances it is better to use a numeric code, such as 1 = male and 2 = female.
Calculated field: age has not been entered, but has been set up as a formula that calculates age once the date of birth has been entered.
Numeric code: offence type is denoted by a numeric code, for instance 1 = violence 13 = theft and 8 = fraud. –1 is the code for not known, and demonstrates the importance of having a code to cover all circumstances. The coding frame for such items should be determined before data entry begins, and is frequently incorporated into the data collection instrument to ease the cross-checking process.
Number: The number of previous convictions is entered as the actual number. It is important to decide in advance whether to allow any number to be entered, or whether to set a maximum. For instance, it is common to set a maximum of 99 for previous convictions as more than this is relatively rare.
The way in which data is entered into the data set imposes limits on the way in which it can be analysed, as will be described in more detail in the next section. It is important to be aware of these constraints when deciding how data will be input. For instance, calculations can be performed on data entered as numbers, so in the example above it is possible to calculate the average (mean) age of the individuals in the group because the data is real numbers. In the data set above the mean age is 24.75 years. If age had been collected in groups, such as under 17, 17–20, 21–25, 26 and over, it would not be possible to calculate the mean age.

Data checking

Simple mistakes and slips of the fingers can easily occur when inputting data, so it is important to identify and rectify as many of these as possible before analysis begins. One simple mechanism is to use the facilities of many programs to check whether data is outside an acceptable range on entry. For instance, the ‘sex’ field can be set to default to upper case if lower case letters are entered, and can also be set to alert the inputter if a letter ‘N’ is entered by mistake. More sophisticated programmes can be set up to perform cross checks on data items, either as data is being entered or in a special run of a programme after entry. These routines can check for obvious errors, such as age at commencement of a probation order being 15. The inputter would then need to check the original document to find out whether the age or the type of order had been entered wrongly.
Visual checks of summaries of data are also important to identify apparently anomalous cases. Most programs have a ‘find’ facility that can be used to locate the specific record causing the problem, for reference back to the original paperwork and rectification. Where data is clearly incorrect but the real data cannot be obtained that variable should be coded as ‘not known’ for that case.
Once the data set has been checked and it ready for analysis it is a wise precaution to make a copy on disk in case disaster strikes during the process of analysis.

Processing qualitative data

Qualitative data is more frequently processed by hand, but again there are computer programs available that can assist the process. It is particularly helpful to have all the materials word-processed, be they transcripts of interviews, notes collected during observation, or thoughts from reflecting on the process. The processing and analysis of qualitative data are not so separate as with quantitative analysis. Robson (1993, p. 377) offers some ‘basic rules for dealing with qualitative data’ which highlight this.
1.Analysis of some form should start as soon as data is collected. Do not allow data to accumulate without preliminary analysis.
2.Make sure you keep tabs on what you have collected (literally – get it indexed).
3.Generate themes, categories, codes, etc. as you go along. Start by including rather than excluding; you can combine and modify as you go on.
4.Dealing with the data should not be a routine or mechanical task: think, reflect! Use analytical notes (memos) to help to get from the data to a conceptual level.
5.Use some form of filing system to sort your data. Be prepared to re-sort. Play with the data.
6.There is no one ‘right’ way of analysing this kind of data – which places even more emphasis on your being systematic, organised and persevering.
7.You are seeking to take apart your data in various ways and then trying to put them together again to form some consolidated picture. Your main tool is comparison.

Interpretation of results

Whether qualitative or quantitative analysis is undertaken, the results do not speak for themselves. They require interpretation and placing in context. This context is provided by the original clarification of the objectives of the evaluation, the initial identification of what was already known about the topic and, importantly, the theoretical underpinnings of the practice being evaluated.
  • Quantitative analysis

Quantitative analysis rarely uses raw numbers, but is more concerned with statistics such as averages and percentages, and with the statistical comparison of groups and sub-groups. It is first necessary to distinguish between different types of data, or ‘levels of measurement’.

Levels of measurement

A basic requirement for the application of appropriate statistics is an understanding of different levels of measurement, which will enable the correct identification of variable type and thus the correct choice of statistic. There are four levels of measurement: nominal, ordinal, interval and ratio. These levels represent a hierarchy, with nominal data being the lowest level of measurement and ratio data being the highest. The higher the level of measurement the more the numbers used have real meaning and the more statistical procedures are available for use.

Nominal, sometimes called categorical, variables are split into simple descriptive categories. Such categories can have numbers allocated as codes, but the number has no numerical meaning. A simple example is sex, where the two categories are male and female. If the two categories were coded 1 and 2, those numbers would merely identify the category. The range of statistical procedures for such data is very limited.

Ordinal variables have categories of data where there is a relationship or potential order between the categories such that come categories are higher or lower than others. A simple example here is in a pack of playing cards where the king has higher value than the queen, which in turn has higher value than the jack. The ace is interesting in that sometimes it has a higher value than these three cards, and sometime a lesser one. This is a useful reminder that such ordering is frequently related to the context within which the data occurs and is being analysed. A relevant example from probation practice is the classification of offence types. We sometimes place them in an order where some offences are more serious than others, for instance violence offences are often classed as more serious than theft offences, but always remembering that the ordering is not that simple, and that some theft offences can be more serious than some violence offences. The answers to questions on attitude scales are usually ordinal, with a range of five categories from ‘strongly disagree’ to ‘strongly agree’. These categories are given numbers, though the detail of the numbers does not matter: they can be coded 1,2,3,4,5 or 5,4,3,2,1 or 0,1,2,3,4 or even –2, –1, 0,1,2. The important feature is the order rather than the specific numbers, and demonstrates that numerical analysis of such data is limited, though there are a few more statistical procedures available.

Interval and ratio variables have real numbers. In both types the ‘intervals’ between the numbers are equal, such that the difference between the score 10 and the score 11 is the same as the difference between the score 69 and 70. Ratio scales have an additional property in that ratios between the numbers are meaningful. This can best be understood by considering temperature, which is interval measurement, and age, which is ratio measurement. In temperature, whether Centigrade or Fahrenheit, the difference between 20 degrees and 25 degrees is the same as the difference between 50 and 55 degrees, but it does not make sense to say that a temperature of 50 degrees is twice as hot as a temperature of 25 degrees. With age on the other hand, it would make sense to say that someone aged 50 was twice as old as someone aged 25. A simple means of identifying the difference is to consider whether a negative value would make sense. Interval measurement can have a negative value – sub-zero temperatures are common, whereas ratio measurement cannot – a negative age is not possible. For statistical purposes, the distinction between interval and ratio measurement is irrelevant.

The identification of appropriate levels of measurement is not without controversy, particularly with respect to the use of statistics on scores obtained from evaluation instruments. An interesting example here is IQ scores, which are frequently treated as interval measurement, but arguably can only be ordinal. We cannot be sure that the difference between a score of 70 and 75 is equivalent to the difference between a score of 140 and 145. However, most researchers treat psychometric and other multiple-item scales as interval (for a discussion, see Bryman and Cramer 1990 Chapter 4).
  • Qualitative analysis

The aims of analysis of qualitative data are the same as those for the analysis of quantitative data: to make sense of the data collected and to produce robust results that can be substantiated as a valid representation of the real world and not the idiosyncratic ‘subjective’ perspective of the evaluator. A systematic approach to analysis is particularly important in qualitative studies. As with quantitative analysis, flaws in design cannot be redressed by analysis no matter how good or comprehensive it is. If a quantitative picture were required, open interviews would not be the best method to achieve it, and attempts at quantification at the analysis stage could be futile.

The analysis of qualitative data is time consuming and techniques are not so clear-cut as those for quantitative data.

Computer programs are available to analyse qualitative data, but they do require considerable input to establish appropriate coding frames for the analysis, and the task should not be underestimated. As outlined in the previous section, data collection and analysis are not such separate processes in qualitative work, and in fact it is important to undertake some limited analysis and reflection during the collection of qualitative data to inform the detail of the data that is being collected.

Although some analysis will have been undertaken during data collection there will remain a substantial task of analysis at the end of that process. Qualitative analysis is an iterative process, where data is worked, reworked and refined as the emerging picture becomes clearer. There are two broad approaches to the task.
The theory driven approach – this method starts with the theoretical framework which underpinned the design of the evaluation, and assesses the extent to which the ‘theory’ was found in practice.
The descriptive framework – where a theoretical framework does not really exist the data is used to construct one. A frequently used approach is ‘issues analysis’, where the issues that drove the design of the study, or that emerge during data collection and analysis are used to focus the selection and organisation of material.

Within these basic approaches, there are a range of techniques that can be employed to assist the analytic process, the choice of which will depend on the evaluation design and the question being addressed.
Time series analysis (not to be confused with the statistical technique of the same name) looks at patterning of events over time, with a particular focus on changes in pattern. For instance, a single case study design would use this approach to assess the impact of a particular programme intervention with an offender, in order to investigate whether the pattern of offending after treatment had changed in the desired direction.
Chronology is a useful approach for analysing the life history of an individual or institution.
Triangulation can be used in multi-method approaches, where themes emerging from quantitative data can be examined in more detail in the qualitative data.
Key events can be used as the means of organising data, for instance the nature of follow up for a missed appointment. The choice of key events may be guided by the theory driving the evaluation, or emerge from the data collected.

There are a range of texts dedicated to qualitative methodology and analysis. Recommended reading in this area is presented at the end of the handbook.

Descriptive statistics

An initial step in most quantitative analysis is to produce a frequency distribution for each of the variables in the data set so as to obtain an overview of the data and check for obvious anomalies. A frequency distribution is a simple listing of all the possible entries for a particular variable, showing how cases fall into each category. For instance, the frequency distribution of ‘Sex’ in the data set in the last section would be as shown in the Table below.
The calculation of percentages is an important dimension in frequency distributions, particularly where numbers become large. In the example above, with very small numbers, knowing that 3 is 75% of 4 is not particularly helpful, but it would help to know that 65 is 76% of 86. Percentages also become very important when wanting to compare groups of cases.

Frequency distributions are very useful for highlighting variables where substantial numbers of cases have data missing. Care should be taken with statistical analysis involving such variables, even the presentation of simple percentages. Where you can be confident that the missing data is randomly spread, it is reasonable to calculate statistics based on the data that is available, which has the effect of apportioning missing cases according to the distribution of the known data. Where there is the possibility that missing data is not random it is important to consider the likely bias within the data and its possible implications for further analysis.

Measures of central tendency (‘averages’)

Where variables have a large number of categories, a simple frequency distribution is not very illuminating, and a different descriptive statistic is required. Often the measure of central tendency is chosen – this is essentially an indicator of typicality.

Where the data is an interval/ratio measurement, such as age, the most common measure is the mean (commonly known as the average). This is simply calculated by adding together all the values in a particular variable and dividing by the number of cases included. Using the previous data set, the calculation for the mean age would be 20 + 17 + 25 +37 = 99 divided by 4 = 24.75, which would generally be rounded up to 25. This sort of calculation can be misleading, however, when dealing with small numbers of cases. It can also be misleading when using a large number of cases if a small number of these cases have extreme values.

The median is an alternative indicator of ‘typicality’ that can be calculated for ordinal as well as interval/ratio data. The median is the value in the centre of the distribution. All values are placed in numerical order and the number of cases counted from one end until the midpoint, or median, of the distribution is found. For instance, with a sample of 37 cases the median would be the value at which the 19th case occurred. The answer would be the same whether calculated from the highest or the lowest value. This figure is not skewed by extreme values. An example will help. The ages from the previous dataset, placed in order are 17, 20, 25, 37. The midway point in this distribution falls between 20 and 25, which would give a median value of 22.5 years.

Another commonly used measure of ‘typicality’ is the mode. This is the only measure that can be used for coded categorised data (known as nominal data), and is quite simply the value which occurs most frequently. For the frequency distribution of sex shown above, the mode would be male, but where data is dichotomous (i.e. there are just two categories – male and female) it is not usually referred to as the mode. It is more helpful with three or more categories, such as offence.

Measures of central tendency (‘averages’)

Other important summary statistical measures, particularly useful with data such as age, give an indication of the ‘spread’ of the data. This indicates how much variability there is within the data, and thereby indicates whether the mean is a good summary statistic for the data. Simple measures of spread are the range and the interquartile range. The range is quite simply the difference between the highest and the lowest figures in the distribution. In the age example above, the minimum is 17 and the maximum is 37, therefore the range is 20. As with the mean, the range can easily be skewed by a rogue extreme value, and the interquartile range can be a good substitute here. A quartile is calculated in the same way as the median, but is the points at which a quarter and three quarters of the cases occur. Thus the interquartile range includes half of all the cases in the sample. It would not make sense to calculate the interquartile range on the small dataset of four cases. In a set of 20 cases with the following values for age, the ‘x’s denote the first and third quartile points, and the ‘m’ denotes the median point.

17 17 18 20 21 x 21 24 25 27 27 m 28 30 33 34 37 x 41 45 52 57 63
the first quartile point is 21
the median is 27.5;
the third quartile is 39:
thus the range is 63-17 = 46,
the interquartile range is 39-21 = 18
the mean is 31.85 – which can be rounded to 32.

The summary figures show a wide range to the figures (46), but greater concentration of the middle 50% of scores in the interquartile range (18). There is more space between the median and the third quartile (27.5 and 39) than between the first quartile and the median (21 and 27.5), showing greater concentration at the lower end of the scale and more spread at the top. As would be expected, this greater spread at the higher values has resulted in a mean of 32 (31.85 rounded), which is substantially more than the median value of 27.5. In this distribution of values the median is a better summary measure of the data than the mean.

This very simple example illustrates that statistics require selection and interpretation in the same way as more qualitative data, a feature that is often overlooked.

When data is interval or ratio there are more sophisticated measures of spread, such as the standard deviation and the variance. Interested readers should consult a statistical text such as Clegg (1990).
Note: measures can be applied to a higher level of data (e.g. the mode to ordinal data), but not to a lower level (e.g. the median to nominal data)

The graphical presentation of data is another valuable way of analysing data. The visual representation of data in the form of pie charts, bar charts, or box plots can often highlight features of the data that are masked in tabular analysis.

Statistical testing

A key component of quantitative data analysis is comparison, whether of ‘before’ and ‘after’ measures of the same group or measures of the same thing in different groups. The simple tabular presentation of the data from two groups is called a cross tabulation or contingency table, such as that presented below. This is a very simple 2 5 2 cross tabulation, signifying two rows and two columns.
We can see from the table above that the number of offenders from the experimental group who were reconvicted (18) is slightly less than the number in the control group who were reconvicted (21), but as there were fewer people in total in the control group it is difficult to assess the extent of the difference, if any. The inclusion of percentages reconvicted and not within each group makes it easier to see the difference between them.
We now see that the reconviction rate for the experimental group was 40%, much lower than the 64% reconviction for the control group. It is easy to latch on to such descriptive differences between groups and pronounce them as important, but how sure can we be that this is a result which is likely to be repeated? Statistical tests enable us to make an assessment of the strength of any differences found, and how much confidence we can have in their reality.

Essentially, statistical tests calculate the probability of the obtained results occurring by chance. If the result has a high probability of occurring by chance we say there is no statistically significant difference between the groups. If the result has a low probability of occurring by chance we say that there is a statistically significant difference between them. The probability is calculated in percentage terms, but is usually represented as a ‘p’ value, which is the percentage transformed into the proportion of the number 1, so for instance 10% = 0.10. The accepted threshold for statistical significance in social research is 5%, or p=0.05. Any result less than this is accepted as ‘real’, and may be written as the precise value or as p<0.05. Where the probability is very small this is frequently reported as p<0.01 or p<0.001. The statistical tests take account of the number of cases included in the analysis, and require bigger differences between the groups to obtain statistically significant results for small samples. Table 3.10 gives an example. Here it is assumed that two groups are being compared, each of the size shown under ‘sample size’. It is also assumed that 50% of the control group were reconvicted, but a smaller proportion of the experimental group, as shown in each row. The table shows, using a chi-square test (described below), whether the difference in reconviction rates is statistically significant. In the example below, with a sample size of 40 and a percentage difference of 25%, the chi-square test produced a result of p= 0.02, i.e. a 2% probability of a chance result, and therefore statistically significant.
The Table above can also be used in reverse to show the size of sample that will be needed to demonstrate an effect. If a large effect is expected smaller samples will be sufficient. Where the effect is unknown it is better to err towards a larger sample.

It is important to remember that a result that is not statistically significant does not mean that the programme has no effect. If the test indicates no significant effect, it may be showing that it is unproven – it has a high probability of having occurred by chance because the numbers involved are low, and larger numbers are needed to be sure.

Tests of significant difference between two variables

A range of statistical tests have been developed to demonstrate differences. Different tests are appropriate depending on the level of data measurement and the types of comparison being made. Technical skill is needed to know which test to use. Some guidance is given in programs such as SPSS, but it is best to speak to someone who has statistical knowledge and experience. Readable texts on this area are recommended at the end of this handbook. The most relevant tests for evaluation in the probation service are briefly described below. The utility disk included with this handbook can be used to calculate most of these.
Chi-square: This is the most frequently used test, carried out on cross-tabulated data (as in Table 3.9). The test works by comparing the values observed in the table with those which would have been expected if the results had occurred by chance. If the differences are big enough, taking account of the size of the sample, the result will be statistically significant. Generally a robust and reliable test, it is best not used on tables containing fewer than 20 cases and where cell counts are below 5. (Programs such as SPSS will indicate whether this is a problem and make necessary adjustments.) It is a simple test to calculate by hand, and straightforward guidance can be found in Clegg (1990). Although it can be calculated for tables with a large number of cells this is not advisable as results can be difficult to interpret. This is the main statistical test used when comparing groups using nominal data.
Wilcoxon: This test is primarily used to assess the difference between before-and-after ranking scores obtained for the same individuals (sometimes referred to as matched samples). It is also known as the signed rank test, because it works by assessing whether the second score is greater (+) or less than (–) the first score and the magnitude of the difference. The differences are ranked by magnitude, ignoring the sign. The sum of the positive ranks and the negative ranks are calculated and used with the sample size to assess the probability level. The data being compared must be at least ordinal.
Mann Whitney: This test is used to compare two unmatched groups on an ordinal variable, for example the scores obtained by males and females completing a programme. It works by combining the data from the two groups, ranking the scores and then separating the ranks back into their respective groups. The test then statistically compares the range of rankings in the two groups to produce a ‘U’ statistic, which is assessed with the sample sizes to determine the probability of the result having occurred by chance.
t-test This is the equivalent test to Wilcoxon and Mann Whitney when analysing interval/ratio data, and is the one used to assess score changes in most of the reports in Chapters 4 and 5. There are two versions of the test, depending on whether the groups being compared are matched or unmatched. It assesses the strength of the difference between the mean values of the two groups, and takes account of the variation within the data. This is quite a complicated test to calculate by hand and is best done with the aid of a computer.
Note: measures can be applied to a higher level of data (e.g. Wilcoxon to interval data), but not to a lower level (e.g. the t-test to ordinal data).

Tests of statistical relationship: correlation

Another useful statistical technique is correlation. This is used not to establish whether two variables are significantly different, but to explore the strength of their similarity. For example, it could be used to examine whether Crime-Pics scores for a set of people are similar to their scores on another attitude scale. The variables should be ordinal or interval/ratio. Correlation assesses whether there is a linear relationship between the variables. The correlation is described as being either positive, where the values of both variables increase together, or negative, where the value of one variable increases as the value of the other decreases. The strength of a correlation can be calculated by using one of two tests, Spearman’s rank order correlation or Pearson’s product moment correlation. Spearman is used where the data is ordinal, and Pearson where the data is interval or ratio.

The result of both tests is a correlation coefficient ‘r’, which can have a value anywhere between +1, which is perfect positive correlation and –1, which is perfect negative correlation, with 0 in the middle representing absolutely no correlation at all. It is exceedingly unlikely that any of these three values would occur in reality. A more realistic value is something like r = –0.126, which is a slight negative correlation, or r = 0.419, which is a moderate positive correlation. Judgement is necessary to assess whether a particular correlation coefficient is an important result in the context of the analysis being undertaken. Both tests additionally provide the statistical probability for the correlation coefficient, which is interpreted in exactly the same way as for testing differences between groups, i.e. values of p=0.05 or smaller are statistically significant. The need for two values with correlation coefficients can make their interpretation confusing. It is quite possible, for instance, to obtain a result that indicates a strong correlation that is not statistically significant owing to small sample size.