Encyclopedia  |   World Factbook  |   World Flags  |   Reference Tables  |   List of Lists     
   Academic Disciplines  |   Historical Timeline  |   Themed Timelines  |   Biographies  |   How-Tos     
Sponsor by The Tattoo Collection
Main Page | See live article | Alphabetical index


and a chromosome (right). An exon is the region of a gene which is not spliced out during mRNA editing, introns are removed. This diagram labels a region of only 40 or so bases as a gene, in reality many genes are much larger.]] 

The word "gene" is shared by many disciplines, including whole organism-based or classical genetics, molecular genetics, evolutionary biology and population genetics. It has multiple uses within each of these contexts, but in the primary sense, genes are material things that parents pass to offspring during reproduction; these things encode information essential for the construction and regulation of polypeptides, proteins and other molecules essential for the growth and functioning of the organism. This sense, which is common to all of the above disciplines, is also the original historical meaning of gene.

Following the discovery that DNA is the genetic material, and with the growth of biotechnology and the project to sequence the human genome, the common usage of the word "gene" has increasingly reflected its meaning in molecular biology. In the primary, molecular-biological sense, genes are segments of DNA within chromosomes. In particular, they are the regions of DNA which cells transcribe into RNAs and translate, at least in part, into proteins.

Table of contents
1 Encoders of proteins
2 Gene activity and regulation
3 Organization of genes
4 Genetic variation
5 Genetic complexity of traits and pitfalls in common usage
6 Other aspects of gene in molecular biology
7 Selfish gene
8 History
9 Typical numbers of genes in an organism:
10 See also
11 External links

Encoders of proteins

A gene in this sense specifies a protein through its chemical structure. Four kinds of sequentially linked nucleotides compose a DNA molecule or strand (more at DNA). These four nucleotides constitute a genetic alphabet. A sequence of three consecutive nucleotides, called a codon, is the protein-coding vocabulary. The sequence of codons in a gene specifies the amino-acid sequence of the protein it encodes. The genetic code is a term used to describe the way in which amino acids are determined by DNA codons. This code is essentially conserved from bacteria to humans; in other words, common to all cellular life.

Through the proteins they encode, genes govern the cells in which they reside. In multicellular organisms they control development of the individual from the fertilized egg and the day-to-day functions of the cells that make up tissuess and organss. The instrumental roles of their protein products range from mechanical support of the cell structure to the transportation and manufacture of other molecules and to the regulation of other proteins' activities.

Gene activity and regulation

Because it is through proteins that genes exert their effects, and because gene transcripts (which are a used for protein synthesis) often degrade rapidly, many genes are in a sense inactive when they are not actively being transcribed. Cells appear to regulate the activity of genes in part by increasing or decreasing their rate of transcription. Over the short term, this regulation occurs through the binding or unbinding of proteins known as transcription factors, which attach to specific non-coding DNA sequences called regulatory elements. Genes may also be silenced through DNA methylation or by chemical changes to the protein components of chromosomes (see histone).

Organization of genes

In most eukaryotic species, very little of the DNA in the genome encodes proteins, and the genes may be separated by vast sequences of so-called junk DNA. Moreover, the genes are often fragmented internally by non-coding sequences called introns, which can be many times longer than the genes themselves. Introns are removed on the heels of transcription by splicing. In the primary molecular sense they represent parts of a gene, however.

All the genes and intervening DNA together make up the genome of an organism, which in many species is divided among several chromosomes and typically present in two or more copies. The location (or locus) of a gene and the chromosome on which it is situated is in a sense arbitrary. Genes that appear together on the chromosomes of one species, such as humans, may appear on separate chromosomes in another species, such as mice. Two genes positioned near one another on a chromosome may encode proteins that figure in the same cellular process or in completely unrelated processes. As an example of the former, many of the genes involved in spermatogenesis reside together on the Y chromosome.

Genetic variation

Due to rare, spontaneous errors (e.g. in DNA replication) mutations in the sequence of a gene may arise. Once propagated to the next generation, this mutation may lead to variations within a species' population. Variants of a single gene are known as alleles, and differences in alleles may give rise to differences in traits, for example eye color. A gene's most common allele is called the wild type allele, and rare alleles are called mutants.

In the many species that carry more than one copy of their genome within each of their somatic cells, these copies are practically never identical. With respect to each gene, the copies that an individual possesses are liable to be distinct alleles, which may act synergistically or antagonistically to generate a trait or phenotype (more at genetics, allele).

Genetic complexity of traits and pitfalls in common usage

In common speech, "gene" is often used to refer to the hereditary cause of a trait, disease or condition--as in "the gene for obesity." A biologist, in contrast, might refer to an allele or a mutation that has been implicated in or associated with obesity. Of course, biologists know that not only genes but factors such as prenatal environment, upbringing, culture and the availability of food decide whether or not a person is obese. To continue with the same example, it is inconceivable that variations within a single gene--or single genetic locus--determine one's genetic predisposition for obesity. These aspects of inheritance--the interplay between genes and environment, the influence of many genes--appear to be the norm with regard to many and perhaps most ("multifactoral") traits. The term phenotype refers to the characteristics that result from this interplay, and includes the effects of chance during embryonic development, such as the migration and patterning of cells.

Other aspects of gene in molecular biology

Regulatory elements and heredity

Natural variations within regulatory sequences appear also to underlie many of the heritable characteristics seen in organisms. The influence of such variations on the trajectory of evolution through natural selection may be as large as or larger than variation in sequences that encode proteins. Thus, though regulatory elements are often distinguished genes in molecular biology, in effect they satisfy the shared and historical sense of the word. Indeed, a breeder or geneticist, in following the inheritance pattern of a trait, has no immediate way to know whether this pattern arises from coding sequences or regulatory sequences. Typically, he or she will simply attribute it to variations within a gene.

RNA genes or non-coding RNA

RNA is almost always the intermediary between genes and proteins. However, for some gene sequences, RNA molecules are actually the functional end products. For example, these molecules may be capable of enzymatic function, such the RNAs known as ribozymes, or they may have a regulatory role, as in the case of small interfering RNAs.

The DNA sequences from which such RNAs are transcribed are known as RNA geness.

Molecular nomenclature and usage

For various reasons, the relationship between genes and proteins is not so simple as "one nucleotide sequence-->one amino-acid sequence." For example, eukaryotic cells may splice the transcripts of a gene in alternate ways to produce not one but a variety of proteins (alternative splicing) from one pre-mRNA. Prokaryotes produce a similar effect by shifting reading frame during translation. Prokayrotes also contain open reading frames that overlap in the genome sequence. In addition, errors during DNA replication may lead to the duplication of a gene, which may diverge over time. Though the two sequences may remain the same or be only slightly altered, they are typically regarded as separate genes (i.e. not as alleles of the same gene). The same is true when duplicate sequences appear in different species. Yet, though the alleles of a gene differ in sequence, nevertheless they are regarded as a single gene (occupying a single locus).

Finally, a molecular biologists will often use gene to refer to just a nucleotide sequence of a gene; and at times the sequence of only its coding regions without the introns. This more abstract sense of gene underlies the sense of genes as information. It also means that, by way of its sequence, not only DNA but RNA may be said either to be to carry a gene (see below).

Human gene nomenclature

For each known human gene the HUGO Gene Nomenclature Committee (HGNC) approve a gene name and symbol (short-form abbreviation). All approved symbols are stored in Genew, the Human Gene Nomenclature Database. Each symbol is unique and each gene is only given one approved gene symbol. It is necessary to provide a unique symbol for each gene so that people can talk about them, it also facilitates electronic data retrieval from publications. In preference each symbol maintains parallel construction in different members of a gene family and can also be used in other species, especially the mouse.

RNAs are genes in some viruses

Although all cell-based organisms carry their genes and transmit them to offspring as DNA, some viruses that parasitize and reproduce in them carry only RNA. Because they use RNA, their cellular hosts may synthesize their proteins as soon as they are infected and without the delay in waiting for transcription. RNA retroviruses, on the other hand, require reverse transcription of their genome from RNA into DNA.

Selfish gene

The genes that exist today are those that have reproduced successfully in the past. This is the basis of the selfish gene view, publicised by Richard Dawkins. He points out in his book, The Selfish Gene, that all DNA exists with no other purpose than to propagate itself, even at the expense of the host organism's welfare. According to Dawkins, the possibly disappointing answer to the question "what is the meaning of life?" may be "the survival and perpetuation of ribonucleic acids and their associated proteins".


The existence of genes was first suggested by Gregor Mendel, who studied inheritance in pea plants and hypothesized a factor that conveys traits from parent to offspring. Although he did not use the term gene, he explained his results in terms of inherited characteristics. Mendel was also the first to hypothesize independent assortment, the distinction between dominant and recessive traits, the distinction between a heterozygote and homozygote, and the difference between what would later be described as genotype and phenotype.

Wilhelm Johannsen coined gene in 1909, based on the work of Gregor Mendel.

Typical numbers of genes in an organism:

The following table gives typical numbers of genes and genome size for some organisms. Estimates of the number of genes in an organism are somewhat controversial, because it is only possible to discover a gene, and no techniques currently exist to prove that a DNA sequence contains no gene. Nonetheless, estimates are made based on current knowledge.

organism # of genes base pairs
Plants <50000 <1011
Humans 35000 3×109
Flies 12000 1.6×108
Fungi 6000 1.3×107
Bacteria 500-6000 5×105-107
Mycoplasma genitalium 500 580,000
DNA viruses 10-300 5000-200,000
RNA viruses 1-25 1000-23,000
Viroids 0-1 ~500
Prions 0 ;0

See also

External links