Sampling is about the choice of cases to include in the evaluation, the aim of which is that the cases should be as representative as possible of the wider group of cases being sampled (the population). This will give generalisability, i.e. the ability to generalise the results to a wider population. Relevant factors are the size of the sample and the means by which it is chosen.
Programme evaluation, particularly with a quasi-experimental design, provides limited opportunities for sampling because the numbers going through programmes are relatively small. It should be remembered, however, that even where all the programme participants are included in an evaluation, they still technically constitute a sample rather than a population – in effect an accidental sample because they just happen to be in the right place at the right time! Sampling procedures may well be necessary, however, to obtain a suitable comparison group. Here, the first decision is whether to obtain a ‘matched’ group where each member of the comparison group is matched on key criteria such as age, sex, offence, OGRS, to a member of the programme group. This is frequently impractical and so an alternative group is chosen. This is frequently a sample, chosen because they meet key criteria, for example they were appropriate for the programme but not given an appropriate sentence by the court.
The size of sample is most crucially important in relation to quantitative work; there are statistical techniques for working out best size samples. These statistical techniques are based on the principles of sampling theory, which basically says that any sample drawn from a population will produce results that approximate to the actual population but will not be exactly the same: some of these samples will be very close to the population being studied, while a few will be very different. Another aspect of this theory shows that the larger the sample, the more likely it is to produce results that are very close to the results of the population. (Common sense suggests that a sample of five will not be as trustworthy as a sample of 1,000). Probability theory can be used to assess the actual number of cases that would be needed to produce reliable results for any given study.
For most purposes in social evaluation it is not possible to do these calculations, and it is sufficient to aim for as large a sample as possible. Figure 8.1 in the Guide to Effective Practice presents some of the problems of drawing conclusions from small samples, and presents the following table showing the ‘sampling error’ related to different sample sizes and its impact on the confidence that can be placed in results from them. This table is worth repeating here.
We can see that the biggest reductions in error come from relatively small increases in sample size: 280 extra cases improve the error proportion from 22% to 6%, whereas the next increase of 900 cases produces a small improvement in the sampling error from 6% to 3%.
A generally accepted rule of thumb about sample size is to aim for a sample of at least 100 cases, but that increasing the sample above that may not be cost effective. The minimum sample size generally accepted in quantitative work is 30–50. Statistical tests take account of sample size, but it is harder to show statistically significant difference with small samples. (See Table 3.10, section 3.5.1)

It is best to assess the required sample in terms of the sort of analysis to be undertaken with the data. Statistical analyses with lots of sub-categories will require larger samples so that reasonable numbers occur in each sub-category; for example, a two way analysis of race and sex will give six sub-categories if using categories of male/female and Asian/black/white. We know that some of these sub-groups are likely to be very small, e.g. black females. If they made up 2% of the population being studied the chances are that a sample of 50 would produce just one, and may not include any in this sub-category. Stratified sampling would ensure adequate numbers in minority groups, but be more complex to organise and may not be feasible in relation to naturally occurring groups. The more variability anticipated on key factors within a population, the larger the sample needed.
It is also important to allow for non-response when planning samples. For example, ‘before and after’ studies usually lose some of the ‘after’ sample, and so the number included in the before sample should be increased to improve the likelihood of obtaining sufficient numbers in the after group. A similar allowance should be made for attrition in the matching of offender details on the offender index/PNC for reconviction.