# Sampling

If the population is small enough, it may be possible to survey everyone. This is not usually the case. Normally you cannot speak to everyone, measure everything or look at every document. You may only have the time and resources to examine a sample of documents or speak to a sample of project participants. You can then infer from these findings something about the wider population of, for example, a town or region.

Drawing on useful guidance on sampling by social cops and Save the Children/ Open University, it is possible to identify many different types of sampling.

## Types of Sampling

**Survey sampling **where you identify a subset of the population to work with – a carefully identified group that is representative of the population (reflective of its diversity, including the least accessible groups). If you are carrying out surveys at more than one point in time, you need to ensure that your samples are consistent (e.g. in terms of age range, class, gender).

**Probability samples** aim to be representative of the population. They may involve:

**Random** sampling where each subject selected independently of the other members of the population. This can be difficult when the population is very large.

Or

**Systematic** sampling where** once you have** decided on the sample size, arrange the elements of the population in some order and select terms at regular intervals from the list. So you might decide to focus on three particular crops grown in a village and study every tenth orange, apple, and pear tree.

**Stratified sampling **is where the population is divided into characteristics of importance for the research — for example, by gender, social class, education level, religion. Thus, if 75% of the population has had no college education then 75% of your sample should equally lack this education. The 75% would be randomly selected from the subset who have not been college-educated.

**Non-Probability Sampling Techniques** include convenience sampling, snowball sampling, and quota sampling.

With **convenience sampling**, the elements of such a sample are picked only on the basis of convenience in terms of availability, reach and accessibility. This can be very useful for smaller NGDOs. The sample may not be truly representative but can be created quickly without adding any additional burden on the available resources. It provides a rough estimate of the results, without incurring the cost or time required to select a random sample.

**Snowball sampling** is where you rely on your initial respondents to refer you to the next respondents. This is a low-cost technique but can limit you to a homogenous strata of the population.

## Bias

When sampling, you must strive to avoid bias, which may mean that some individuals have a greater chance of being selected for the sample than others. Save the Children has suggested that “tarmac bias relates to our tendency to survey those villages that are easily accessible by road.”

It also points to:

**Self-selection or non-response bias whereby **only people with strong views about the topic volunteer, thereby skewing the findings.

For a useful guide on how to avoid bias when conducting research, download the following document:

Exercise: Which sample is not biased?

1. You wait outside the main gates of the University of Yaoundé (the capital of Cameroon) and interview every 10th student.

2. The Cameroonian Schools Minister identifies a sample of schools that you are allowed to visit.

3. Drawing on an accurate, independently verified list of cotton-pickers working across farms in the city of Douala, you pick the names of a sample of workers randomly out of a hat.

(Check answer at the bottom of the page).

As a rule, the larger the sample, the more likely it is that your findings will be representative. But time and cost constraints will ultimately determine your sample size. If it is too small, your results will be inconclusive and too large then it may prove impossible to complete.

## Analysing Results

You can use quantitative data to produce frequency tables, showing, for example, the number of girls studying particular subjects such as science, history, and maths. Similarly, it could show the number of boys aged 14-18 Securing Apprenticeships/ Specialist Training between 2015 and 2017 in in the X region.

You may need to work out averages. These take various forms. Take the following example: John is 8, Florence is 8, Alice is 9, Robert is 12, François is 13.

The **mean **(add up all the values and divide by the number of values) is (8+8+9+12+13)÷5 = 10

The **median **(middle value in a data set) is 9

The **mode** (the most common value) is 8, which appears twice

The **range** is 5 (the difference between 8 and 13)

The **standard deviation** shows the average difference between each individual data point (or the age of the child in our example) and the mean age. If all data points are close to the mean then the standard deviation is low, showing that there is little difference between values.

A large standard deviation shows that there is a larger spread of data. This can be important as it can show for example whether a village has pupils of very different ages and potentially abilities in the same classroom. Equally, it could highlight major differences in life expectancy in different regions of a country. It is quite complicated to work out but BBC Bitesize has devised a step-by-step guide.

Exercise Answer Key: C is the correct response.

## Where to next?

Click here to return to the top of the page, here to return to questionnaires and here to return to Step 2 (Data Collection)