# Probability

The word**derives from the Latin**

*probability**probare*(to prove, or to test). Informally,

*probable*is one of several words applied to uncertain events or knowledge, being more or less interchangeable with

*likely*,

*risky*,

*hazardous*,

*uncertain*, and

*doubtful*, depending on the context.

*Chance*,

*odds*, and

*bet*are other words expressing similar notions. As with the theory of mechanics which assigns precise definitions to such everyday terms as

*work*and

*force*, so the theory of probability attempts to quantify the notion of

*probable*.

## Historical remarks

Probability theory, as applied to observations, was largely a nineteenth century development. Gambling shows that there has been an interest in quantifying the ideas of probability for millennia, but exact mathematical descriptions of use in these types of problems only arose much later.

The doctrine of probabilities dates as far back as Pierre de Fermat and Blaise Pascal (1654). Christiaan Huygens (1657) gave the first scientific treatment of the subject. Jakob Bernoulli's *Ars Conjectandi* (posthumous, 1713) and Abraham de Moivre's Doctrine of Chances (1718) treated the subject as a branch of mathematics.

The theory of errors may be traced back to Roger Cotes's *Opera Miscellanea* (posthumous, 1722), but a memoir prepared by Simpson in 1755 (printed 1756) first applied the theory to the discussion of errors of observation. The reprint (1757) of this memoir lays down the axioms that positive and negative errors are equally probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous errors are discussed and a probability curve is given.

Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities. He represented the law of probability of errors by a curve , being any error and its probability, and laid down three properties of this curve: (1) It is symmetric as to the -axis; (2) the -axis is an asymptote, the probability of the error being 0; (3) the area enclosed is 1, it being certain that an error exists. He deduced a formula for the mean of three observations. He also gave (1781) a formula for the law of facility of error (a term due to Lagrange, 1774), but one which led to unmanageable equations. Daniel Bernoulli (1778) introduced the principle of the maximum product of the probabilities of a system of concurrent errors.

The method of least squares is due to Adrien-Marie Legendre (1805), who introduced it in his *Nouvelles méthodes pour la détermination des orbites des comètes*. In ignorance of Legendre's contribution, an Irish-American writer, Robert Adrain, editor of "The Analyst" (1808), first deduced the law of facility of error,

and being constants depending on precision of observation. He gave two proofs, the second being essentially the same as Herschel's (1850). Gauss gave the first proof which seems to have been known in Europe (the third after Adrain's) in 1809. Further proofs were given by Laplace (1810, 1812), Gauss (1823), Ivory (1825, 1826), Hagen (1837), Bessel (1838), Donkin (1844,1856), and Crofton (1870). Other contributors were Ellis (1844), De Morgan (1864), Glaisher (1872), and Schiaparelli (1875). Peters's (1856) formula for , the probable error of a single observation, is well known.

In the nineteenth century authors on the general theory included Laplace, Lacroix (1816), Littrow (1833), Adolphe Quetelet (1853), Richard Dedekind (1860), Helmert (1872), Laurent (1873), Liagre, Didion, and Pearson. Augustus De Morgan and George Boole improved the exposition of the theory.

On the geometric side (see integral geometry) contributors to *The Educational Times* were influential (Miller, Crofton, McColl, Wolstenholme, Watson, and Artemas Martin).

## Concepts

There is essentially one set of mathematical rules for manipulating probability; these rules are listed under "Formalization of probability" below. (There are other rules for quantifying uncertainty, such as the Dempster-Shafer theory and fuzzy logic, but those are essentially different and not compatible with the laws of probability as they are usually understood.) However, there is ongoing debate over what, exactly, the rules apply to; this is the topic of probability interpretations.

The general idea of probability is often divided into two related concepts:

- Aleatory probability, which represents the likelihood of future events whose occurrence is governed by some
*random*physical phenomenon. This concept can be further divided into physical phenomena that are predictable, in principle, with sufficient information, and phenomena which are essentially unpredictable. Examples of the first kind include tossing dice or spinning a roulette wheel, and an example of the second kind is radioactive decay. - Epistemic probability, which represents our uncertainty about propositions when one lacks complete knowledge of causative circumstances. Such propositions may be about past or future events, but need not be. Some examples of epistemic probability are to assign a probability to the proposition that a proposed law of physics is true, and to determine how "probable" it is that a suspect committed a crime, based on the evidence presented.

## Formalization of probability

Like other theories, the theory of probability is a representation of probabilistic concepts in formal terms -- that is, in terms that can be considered separately from their meaning. These formal terms are manipulated by the rules of mathematics and logic, and any results are then interpreted or translated back into the problem domain.

There have been at least two successful attempts to formalize probability, namely the Kolmogorov formulation and the Cox formulation. In Kolmogorov's formulation, sets are interpreted as events and probability itself as a measure on a class of sets. In Cox's formulation, probability is taken as a primitive (that is, not further analyzed) and the emphasis is on constructing a consistent assignment of probability values to propositions. In both cases, the laws of probability are the same, except for technical details:

- a probability is a number between 0 and 1;
- the probability of an event or proposition and its complement must add up to 1; and
- the joint probability of two events or propositions is the product of the probability of one of them and the probability of the second, conditional on the first.

### Representation and interpretation of probability values

The probability of an event is generally represented as a real number between 0 and 1. An *impossible* event has a probability of exactly 0, and a *certain* event has a probability of 1, but the converses are not always true: probability 0 events are not always impossible, nor probability 1 events certain.
The rather subtle distinction between "certain" and "probability 1" is treated at greater length in the article on "almost surely".

Most probabilities that occur in practice are numbers between 0 and 1, indicating the event's position on the continuum between impossibility and certainty. The closer an event's probability is to 1, the more likely it is to occur.

For example, if two events are assumed equally probable, such as a flipped coin landing heads-up or tails-up, we can express the probability of each event as "1 in 2", or, equivalently, "50%" or "1/2".

Probabilities are equivalently expressed as odds, which is the ratio of the probability of one event to the probability of all other events. The odds of heads-up, for the tossed coin, are (1/2)/(1 - 1/2), which is equal to 1/1. This is expressed as "1 to 1 odds" and often written "1:1".

Odds *a*:*b* for some event are equivalent to probability *a*/(*a*+*b*).
For example, 1:1 odds are equivalent to probability 1/2, and 3:2 odds are equivalent to probability 3/5.

There remains the question of exactly what can be assigned probability, and how the numbers so assigned can be used; this is the question of probability interpretations. There are some who claim that probability can be assigned to any kind of an uncertain logical proposition; this is the Bayesian interpretation. There are others who argue that probability is properly applied only to propositions concerning sequences of repeated experiments or sampling from a large population; this is the frequentist interpretation. There are several other interpretations which are variations on one or the other of those, or which have less acceptance at present.

### Distributions

A probability distribution is a function that assigns probabilities to events or propositions. For any set of events or propositions there are many ways to assign probabilities, so the choice of one distribution or another is equivalent to making different assumptions about the events or propositions in question.

There are several equivalent ways to specify a probability distribution. Perhaps the most common is to specify a probability density function. Then the probability of an event or proposition is obtained by integrating the density function. The distribution function may also be specified directly. In one dimension, the distribution function is called the cumulative distribution function. Probability distributions can also be specified via moments or the characteristic function, or in still other ways.

A distribution is called a **discrete distribution** if it is defined on a countable, discrete set, such as a subset of the integers.
A distribution is called a **continuous distribution** if it has a continuous distribution function, such as a polynomial or exponential function.
Most distributions of practical importance are either discrete or continuous, but there are examples of distributions which are neither.

Important discrete distributions include the discrete uniform distribution, the Poisson distribution, the binomial distribution, the negative binomial distribution and the Maxwell-Boltzmann distribution.

Important continuous distributions include the normal distribution, the gamma distribution, the Student's t-distribution, and the exponential distribution.

## Probability in mathematics

Probability axioms form the basis for mathematical probability theory. Calculation of probabilities can often be determined using combinatorics or by applying the axioms directly. Probability applications include even more than statistics, which is usually based on the idea of probability distributions and the central limit theorem.

To give a mathematical meaning to probability, consider flipping a "fair" coin. Intuitively, the probability that heads will come up on any given coin toss is "obviously" 50%; but this statement alone lacks mathematical rigor - certainly, while we might *expect* that flipping such a coin 10 times will yield 5 heads and 5 tails, there is no *guarantee* that this will occur; it is possible for example to flip 10 heads in a row. What then does the number "50%" mean in this context?

One approach is to use the law of large numbers. In this case, we assume that we can perform any number of coin flips, with each coin flip being independent - that is to say, the outcome of each coin flip is unaffected by previous coin flips. If we perform *N* trials (coin flips), and let *N*_{H} be the number of times the coin lands heads, then we can, for any *N*, consider the ratio *N*_{H}/*N*.

As *N* gets larger and larger, we expect that in our example the ratio *N*_{H}/*N* will get closer and closer to 1/2. This allows us to *define* the probability Pr(*H*) of flipping heads as the mathematical limit, as *N* approaches infinity, of this sequence of ratios:

*a priori*probability to a particular outcome (in this case, our

*assumption*that the coin was a "fair" coin). The law of large numbers then says that, given Pr(

*H*), and any arbitrarily small number ε, there exists some number

*n*such that for all

*N*>

*n*,

*eventually*the number of heads over the number of total flips will become arbitrarily close to 1/2; and will then stay

*at least*as close to 1/2 for as long as we keep performing additional coin flips.

The *a priori* aspect of this approach to probability is sometimes troubling when applied to real world situations. For example, in the play *Rosencrantz and Guildenstern are Dead* by Tom Stoppard, a character flips a coin which keeps coming up heads over and over again, a hundred times. He can't decide whether this is just a random event - after all, it is possible (although unlikely) that a fair coin would give this result - or whether his assumption that the coin is fair is at fault.

### Remarks on probability calculations

The difficulty of probability calculations lie in determining the number of possible events, counting the occurrences of each event, counting the total number of possible events. Especially difficult is drawing meaningful conclusions from the probabilities calculated. An amusing probability riddle, the Monty Hall problem demonstrates the pitfalls nicely.

To learn more about the basics of probability theory, see the article on probability axioms and the article on Bayes' theorem that explains the use of conditional probabilities in case where the occurrence of two events is related.

## Applications of probability theory to everyday life

A major effect of probability theory on everyday life is in risk assessment and in trade on commodity markets. Governments typically apply probability methods in environment regulation where it is called "pathway analysis", and are often measuring well-being using methods that are stochastic in nature, and choosing projects to undertake based on their perceived probable effect on the population as a whole, statistically. It is not correct to say that statistics are involved in the modelling itself, as typically the assessments of risk are one-time and thus require more fundamental probability models, e.g. "the probability of another 9/11". A law of small numbers tends to apply to all such choices and perception of the effect of such choices, which makes probability measures a political matter.

A good example is the effect of the perceived probability of any widespread Middle East conflict on oil prices - which have ripple effects in the economy as a whole. An assessment by a commodity trade that a war is more likely vs. less likely sends prices up or down, and signals other traders of that opinion. Accordingly, the probabilities are not assessed independently nor necessarily very rationally. The theory of behavioral finance emerged to describe the effect of such groupthink on pricing, on policy, and on peace and conflict.

It can reasonably be said that the discovery of rigorous methods to assess and combine probability assessments has had a profound effect on modern society. A good example is the application of game theory, itself based strictly on probability, to the Cold War and the mutual assured destruction doctrine. Accordingly, it may be of some importance to most citizens to understand how odds and probability assessments are made, and how they contribute to reputations and to decisions, especially in a democracy.

## See also

- Bayesian probability
- Bernoulli process
- Cox's theorem
- Decision theory
- Games of chance
- Game theory
- Information theory
- Law of averages
- Law of large numbers
- Normal distribution
- Random fields
- Random variable
- Statistics
- Stochastic process
- Wiener process

## External links

- Edwin Thompson Jaynes.
*Probability Theory: The Logic of Science*. Preprint: Washington University, (1996). -- HTML and PDF - Probabilistic football prediction competition, probabilistic scoring and further reading.
- "
*The Not So Random Coin Toss, Mathematicians Say Slight but Real Bias Toward Heads*". NPR. - Figuring the Odds (Probability Puzzles)

## Quotations

- Damon Runyon, "It may be that the race is not always to the swift, nor the battle to the strong - but that is the way to bet."
- Pierre-Simon Laplace "It is remarkable that a science which began with the consideration of games of chance should have become the most important object of human knowledge."
*Théorie Analytique des Probabilités*, 1812. - Richard von Mises "The unlimited extension of the validity of the exact sciences was a characteristic feature of the exaggerated rationalism of the eighteenth century" (in reference to Laplace).
*Probability, Statistics, and Truth,*p 9. Dover edition, 1981 (republication of second English edition, 1957).