# Formal language

In mathematics, logic and computer science, a**formal language**is a set of finite-length words (i.e. character strings) drawn from some finite alphabet. Note that we can talk about

*formal language*in many contexts (scientific, legal and so on), meaning a mode of expression more careful and accurate than everyday speech. Use of a particular formal language in the sense intended here is an 'ultimate' version of that usage: formal enough to be used in written form for automatic computation, is a possible criterion.

A typical alphabet would be {*a*, *b*}, and a typical string over that alphabet would be

*ababba*.

*a*and

*b*.

The **empty word** (that is, length-zero string) is allowed and is often denoted by *e*, ε or λ. While the alphabet is a finite set and every string has finite length, a language may very well have infinitely many member strings (because the length of words in it may be unbounded).

Some examples of formal languages:

- the set of all words over {
*a*,*b*}; - the set {
*a*^{n}:*n*is a prime number }; - the set of syntactically correct programs in a given programming language; or
- the set of inputs upon which a certain Turing machine halts.

- Strings produced by some formal grammar (see Chomsky hierarchy);
- Strings produced by a regular expression;
- Strings accepted by some automaton, such as a Turing machine or finite state automaton;
- From a set of related YES/NO questions those ones for which the answer is YES — see decision problem.

*L*

_{1}and

*L*

_{2}are languages over some common alphabet.

- The
*concatenation**L*_{1}*L*_{2}consists of all strings of the form*vw*where*v*is a string from*L*_{1}and*w*is a string from*L*_{2}. - The
*intersection*of*L*_{1}and*L*_{2}consists of all strings which are contained in*L*_{1}and also in*L*_{2}. - The
*union*of*L*_{1}and*L*_{2}consists of all strings which are contained in*L*_{1}or in*L*_{2}. - The
*complement*of the language*L*_{1}consists of all strings over the alphabet which are not contained in*L*_{1}. - The
*right quotient**L*_{1}/*L*_{2}of*L*_{1}by*L*_{2}consists of all strings*v*for which there exists a string*w*in*L*_{2}such that*vw*is in*L*_{1}. - The
*Kleene star**L*_{1}* consists of all strings which can be written in the form*w*_{1}*w*_{2}...*w*_{n}with strings*w*_{i}in*L*_{1}and*n*≥ 0. Note that this includes the empty string ε because*n*= 0 is allowed. - The
*reverse**L*_{1}^{R}contains the reversed versions of all the strings in*L*_{1}. - The
*shuffle*of*L*_{1}and*L*_{2}consists of all strings which can be written in the form*v*_{1}*w*_{1}*v*_{2}*w*_{2}...*v*_{n}*w*_{n}where*n*≥ 1 and*v*_{1},...,*v*_{n}are strings such that the concatenation*v*_{1}...*v*_{n}is in*L*_{1}and*w*_{1},...,*w*_{n}are strings such that*w*_{1}...*w*_{n}is in*L*_{2}.