# Sampling (statistics)

**Sampling**is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. In particular, results from probability theory and statistical theory are employed to guide practice.

The sampling process consists of five stages:

- Definition of population of concern
- Specification of a sampling frame, a set of items or events that it is possible to measure
- Specification of sampling method for selecting items or events from the frame
- Sampling and data collecting
- Review of sampling process

## Population definition

Successful statistical practice is based on focused problem definition. Typically, we seek to take action on some population, for example when a batch of material from production must be released to the customer or sentenced for scrap or rework. Alternatively, we seek knowledge about the cause system of which the population is an outcome, for example when a researcher performs an experiment on rats with the intention of gaining insights into biochemistry that can be applied for the benefit of humans. In the latter case, the population of concern can be difficult to specify, as it is in the case of measuring some physical characteristic such as the electrical conductivity of copper.

However, in all cases, time spent in making the population of concern precise is always well spent, often because it raises many issues, ambiguities and questions that would otherwise have been overlooked at this stage.

## Sampling frame

- Electoral register
- Telephone directory
- Shoppers in Anytown, High Street on the Monday afternoon before the election.

In defining the frame, practical, economic, ethical and technical issues need to be addressed. The need to obtain timely results may prevent extending the frame far into the future.

The difficulties can be extreme when the population and frame are disjoint. This is a particular problem in forecasting where inferences about the future are made from historical data. In fact, in 1703, when Jacob Bernoulli proposed to Gottfried Leibniz the possibility of using historical mortality data to predict the probability of early death of a living man, Gottfried Leibniz recognised the problem in replying:

*Nature has established patterns originating in the return of events but only for the most part. New illnesses flood the human race, so that no matter how many experiments you have done on corpses, you have not thereby imposed a limit on the nature of events so that in the future they could not vary.*

Having established the frame, there are a number of ways of organising it to improve efficiency and effectiveness.

### Simple sampling

### Stratified sampling

Where the population embraces a number of distinct categories, the frame can be organised by these categories into separate *strata* or demographics. One of the sampling methods below is then applied to each *stratum* separately, maintaining the same balance in numbers as exists in the population and resulting in an improvement in precision.

### Cluster sampling

Where items in the population are clustered, sampling can reflect this to minimise costs. For example, in a national survey by personal interview, many people will be remotely located and costly to reach. Cluster sampling locates the frame in areas of concentrated habitation.

### Multistage sampling

...

## Sampling method

### Random sampling

In Random sampling, every combination of items from the frame, or stratum, has an equal probability of occurring. It guarantees that the sample is representative of the frame but is infeasible in many practical situations. It is a type of probability sampling.

### Systematic sampling

Selecting (say) every tenth name from the telephone directory is simple to implement and is an example of systematic sampling. Though simple to implement, asymmetries and biases in the structure of the data can lead to bias in results. It is a type of nonprobability sampling

### Mechanical sampling

Mechanical sampling occurs typically in sampling solids, liquids and gases, using devices such as grabs, scoops, thief probes, the coliwasa and riffle splitter.

Mechanical sampling is not random and is a type of nonprobability sampling. Care is needed in ensuring that the sample is representative of the frame. Much work in this area was developed by Pierre Gy.

### Convenience sampling

### Sample size

## Sampling and data collection

Good data collection involves:

- Following the defined sampling process
- Keeping the data in time order
- Noting comments and other contextual events
- Recording non-responses

## Review of sampling process

After sampling, a review should be held of the exact process followed in sampling, rather than that intended, in order to study any effects that any divergences might have on subsequent analysis. A particular problem is that of *non-responses*.

### Non-responses

In survey sampling, many of the individuals identified as part of the sample may be unwilling to participate or impossible to contact. In this case, there is a risk of differences, between (say) the willing and unwilling, leading to bias in conclusions. This is often addressed by follow-up studies which make a repeated attempt to contact the unresponsive and to characterise their similarities and differences with the rest of the frame.

## Bibliography

- Cochran, W G (1977)
*Sampling Techniques* - Deming, W E (1975) On probability as a basis for action,
*The American Statistician*, 29(4), pp146-152 - Gy, P (1992)
*Sampling of Heterogeneous and Dynamic Material Systems: Theories of Heterogeneity, Sampling and Homogenizing*

## Related topics

## External Links