Encyclopedia  |   World Factbook  |   World Flags  |   Reference Tables  |   List of Lists     
   Academic Disciplines  |   Historical Timeline  |   Themed Timelines  |   Biographies  |   How-Tos     
Sponsor by The Tattoo Collection
Genetic algorithm
Main Page | See live article | Alphabetical index

Genetic algorithm

A genetic algorithm (GA) is an algorithm used to find approximate solutions to difficult-to-solve problems through application of the principles of evolutionary biology to computer science. Genetic algorithms use biologically-derived techniques such as inheritance, mutation, natural selection, and recombination. Genetic algorithms are a particular class of evolutionary algorithms.

Genetic algorithms are typically implemented as a computer simulation in which a population of abstract representations (called chromosomes) of candidate solutions (called individuals) to an optimization problem evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but different encodings are also possible. The evolution starts from a population of completely random individuals and happens in generations. In each generation, multiple individuals are stochastically selected from the current population, modified (mutated or recombined) to form a new population, which becomes current in the next iteration of the algorithm.

Table of contents
1 Operation of a GA
2 Variants
3 Efficiency
4 Problem domains
5 History
6 Pseudo-code Algorithm
7 Applications
8 Related techniques
9 See also
10 References
11 External links

Operation of a GA

The problem to be solved is represented by a list of parameters which can be used to drive an evaluation procedure, called chromosomess or genomess. Chromosomes are typically represented as simple strings of data and instructions, in a manner not unlike instructions for a von Neumann machine, although a wide variety of other data structures for storing chromosomes have also been tested, with varying degrees of success in different problem domains.

Initially several such parameter lists or chromosomes are generated. This may be totally random, or the programmer may seed the gene pool with "hints" to form an initial pool of possible solutions. This is called the first generation pool.

During each successive generation, each organism (or individual) is evaluated, and a value of goodness or fitness is returned by a fitness function. The pool is sorted, with those having better fitness (representing better solutions to the problem) ranked at the top. Notice that "better" in this context is relative, as initial solutions are all likely to be rather poor.

The next step is to generate a second generation pool of organisms, which is done using any or all of the genetic operators: selection, crossover (or recombination), and mutation. A pair of organisms are selected for breeding. Selection is biased towards elements of the initial generation which have better fitness, though it is usually not so biased that poorer elements have no chance to participate, in order to prevent the solution set from converging too early to a sub-optimal or local solution. There are several well-defined organism selection methods; roulette wheel selection and tournament selection are popular methods.

Following selection, the crossover (or recombination) operation is performed upon the selected chromosomes. Most genetic algorithms will have a single tweakable probability of crossover (Pc), typically between 0.6 and 1.0, which encodes the probability that two selected organisms will actually breed. A random number between 0 and 1 is generated, and if it falls under the crossover threshold, the organisms are mated; otherwise, they are propagated into the next generation unchanged. Crossover results in two new child chromosomes, which are added to the second generation pool. The chromosomes of the parents are mixed in some way during crossover, typically by simply swapping a portion of the underlying data structure (although other, more complex merging mechanisms have proved useful for certain types of problems.) This process is repeated with different parent organisms until there are an appropriate number of candidate solutions in the second generation pool.

The next step is to mutate the newly created offspring. Typical genetic algorithms have a fixed, very small probability of mutation (Pm) of perhaps 0.01 or less. A random number between 0 and 1 is generated; if it falls within the Pm range, the new child organism's chromosome is randomly mutated in some way, typically by simply randomly altering bits in the chromosome data structure.

These processes ultimately result in a second generation pool of chromosomes that is different from the initial generation. Generally the average degree of fitness will have increased by this procedure for the second generation pool, since only the best organisms from the first generation are selected for breeding. The entire process is repeated for this second generation: each organism in the second generation pool is then evaluated, the fitness value for each organism is obtained, pairs are selected for breeding, a third generation pool is generated, etc. The process is repeated until an organism is produced which gives a solution that is "good enough".

A slight variant of this method of pool generation is to allow some of the better organisms from the first generation to carry over to the second, unaltered. This form of genetic algorithm is known as an elite selection strategy.

Observations

There are several general observations about the generation of solutions via a genetic algorithm:

Variants

The simplest algorithm represents each chromosome as a
bit string. Typically, numeric parameters can be represented by integers, though it is possible to use floating point representations. The basic algorithm performs crossover and mutation at the bit level.

Other variants treat the chromosome as a list of numbers which are indexes into an instruction table, nodes in a linked list, hashes, objects, or any other imaginable data structure. Crossover and mutation are performed so as to respect data element boundaries. For most data types, specific variation operators can be designed. Different chromosomal data types seem to work better or worse for different specific problem domains.

There have also been attempts to introduce other operators such as movement of genes, in the manner of 'transposons'. These movements essentially change the schema of the chromosome.

There are also parallel implementations of genetic algorithms that consider computers as 'islands' and implement migrations of populations from computer to the other over a network.

Some variants also introduce a variable fitness function. In the classical genetic algorithm, the fitness function is unchanged over time. In simulated annealing the fitness function is changed over time and in artificial life, each individual in the population can affect the fitness function used for another.

Efficiency

Genetic algorithms are known to produce good results for some problems. Their major disadvantage is that they are relatively slow, being very computationally intensive compared to other methods, such as random optimization.

Recent speed improvements have focused on speciation, where crossover can only occur if individuals are closely-enough related.

Genetic algorithms are extremely easy to adapt to parallel computing and clustering environments. One method simply treats each node as a parallel population. Organisms are then migrated from one pool to another according to various propagation techniques.

Another method, the Farmer/worker architecture, designates one node the farmer, responsible for organism selection and fitness assignment, and the other nodes as workers, responsible for recombination, mutation, and function evaluation.

Problem domains

Problems which appear to be particularly appropriate for solution by genetic algorithms include timetabling and scheduling problems, and many scheduling software packages are based on GAs. GAs have also been applied to engineering. Genetic algorithms are often applied as an approach to solve global optimization problems. Genetic algorithms have been successfully applied to the study of neurological evolution (see NeuroEvolution by Augmented Topologies).

As a general rule of thumb genetic algorithms might be useful in problem domains that have a complex fitness landscape as recombination is designed to move the population away from local minima that a traditional hill climbing algorithm might get stuck in.

History

John Holland was the pioneering founder of much of today's work in genetic algorithms, which has moved on from a purely theoretical subject (though based on computer modelling) to provide methods which can be used to actually solve some difficult problems today.

Pseudo-code Algorithm

 Choose initial population
 Evaluate each individual's fitness
 Repeat
        Select best-ranking individuals to reproduce
        Mate pairs at random
        Apply crossover operator
        Apply mutation operator
        Evaluate each individual's fitness
 Until terminating condition (see below)

Terminating conditions often include:

Applications

Related techniques

Genetic programming is a related technique developed by John Koza, in which computer programs, rather than function parameters, are optimised. Genetic programming often uses tree-based internal data structures to represent the computer programs for adaptation instead of the list, or array, structures typical of genetic algorithms. Genetic programming algorithms typically require running time that is orders of magnitude greater than that for genetic algorithms, but they may be suitable for problems that are intractable with genetic algorithms.

See also

artificial life, automatic label placement, bio-inspired computing, combinatorial optimization, eight queens puzzle, evolutionary computation, genetic programming, fitness landscape, local search, meta heuristics, neuroevolution, NEAT, simulated annealing, tabu search

References

External links