Encyclopedia  |   World Factbook  |   World Flags  |   Reference Tables  |   List of Lists     
   Academic Disciplines  |   Historical Timeline  |   Themed Timelines  |   Biographies  |   How-Tos     
Sponsor by The Tattoo Collection
Filename extension
Main Page | See live article | Alphabetical index

Filename extension

A filename extension or filename suffix is an extra set of (usually) alphanumeric characterss that is appended to the end of a filename to allow computer users (as well as various pieces of software on the computer system) to quickly determine the type of data stored in the file. It is one of several popular methods for distinguishing between file formats.

File managers such as Windows Explorer can have applications assigned for almost every file name extension. For example, a text editor for .txt, a word processor for .doc, a web browser for .htm or .html, PDF viewer or editor for .pdf, a graphics program for .png, .gif or .jpg, a spreadsheet program for .xls, etc. Some extensions, including .exe, .com, .bat, and .cmd, indicate that the file itself may be executed under Windows.

Filename extensions have been in use for decades, but they have gained common usage because the file systems included with DOS and Windows had severe limitations on filenames for many years. They can be considered as a type of metadata, though one of the most visible pieces of such information on modern computer systems.

Table of contents
1 Historical limitations
2 The need for more
3 Security issues
4 Relation to Internet MIME types
5 See also
6 External links

Historical limitations

Early versions of the FAT filesystem used in DOS and Windows had a limitation that only eleven characters could be used to name files. This 11 character space was divided into two parts separated by a period (.). The first part, consisting of eight characters, was generally called the filename or the base name. The last three were called the extension. Since the word filename is eight letters long and ext is a reasonable abbreviation for extension, this can be generalized as:


When doing a file listing, the base name and extension would be separated by spaces, much like this:

Volume in drive A: is LINUX BOOT \n Volume Serial Number is 2410-07EF\n Directory for A:\\\n\n LDLINUX  SYS      5480 1999-04-19  23:24 \n VMLINUZ         530921 1999-04-19  23:24 \n BOOT     MSG       559 1999-04-19  23:24 \n EXPERT   MSG       668 1999-04-19  23:24 \n GENERAL  MSG       986 1999-04-19  23:24 \n KICKIT   MSG       979 1999-04-19  23:24 \n PARAM    MSG       875 1999-04-19  23:24 \n RESCUE   MSG      1020 1999-04-19  23:24 \n SYSLINUX CFG       420 1999-04-19  23:24 \n INITRD   IMG    878502 1999-04-19  23:24 \n        10 files           1,420,410 bytes\n                              35,840 bytes free

The use of spaces often led to confusion with novice DOS users.

The need for more

The filename extension was originally used to easily determine the file's generic type. The need to condense the type of a file into three characters frequently led to inscrutable extensions. Examples include using .GFX for graphics files, .TXT for plain text, and .MUS for music. However, because many different software programs have been made that all handle these data types (and others) in a variety of ways, filename extensions started to become closely associated with certain products—even specific product versions. For example, early WordStar files used .WS or .WSn, where n was the program's version number. Also, filename extensions began to conflict between separate files. One example is .rpm, used by both the RPM Package Manager and RealPlayer (for RealPlayer Media files); another being .qif shared by both Quicken Information Files (financial ledgers) and QuickTime Image Format (pictures).

As time went on, hundreds of different extensions came into use, as software developers invented more and more file formats. To make matters worse, companies and even individual software applications would be assigned their own extensions. This lead to reference manuals being published, devoted entirely to listing the extensions and the type (or types) of data that might be found in files so named. These issues led to the need for alternative systems that have lower chances of conflicts.

Other operating systems, such as Unix and MacOS, generally had much more liberal standards for filenames. Many allowed full filename lengths of approximately 32 characters, and ranges up to 255 were not uncommon. These systems generally allowed for variable-length filename extensions, and also tended to allow more than one dot—partly because they had additional methods for determining file format information. As the Internet age arrived, it was possible to discern who was using Windows systems to edit their web pages versus who used Macintosh or Unix computers, since the Windows users were generally restricted to ending their web page filenames in .HTM (instead of .html). This also became a problem with programmers experimenting with the Java programming language, since it required source code files to have the four-letter extension .java and compiled object code output files to have the five-letter .class extension.

Eventually, Microsoft introduced long filenames and an extended version of the commonly used FAT file system called VFAT to deal with this issue. Microsoft and IBM had previously collaborated on the High Performance File System (HPFS), used in OS/2 and later in Windows NT as NTFS, which did not have strict limitations either. VFAT's long filenames are largely considered to be an ugly kludge, but they removed the important length restriction and allowed files to have a mix of upper case and lower case letters. However, the habit of using three character extensions has continued, along with the problems it creates.

Security issues

Depending on the settings of the shell/file browser the file extension may not be shown. Malicious users who spread a computer virus or computer worm may use a file name like LOVE-LETTER-FOR-YOU.TXT.vbs which then shows up as LOVE-LETTER-FOR-YOU.TXT. However, it only shows up in this way if the user has file extensions disabled (which is the default behavior of Microsoft's software). Therefore, to a user who has file extensions hidden, this looks like a harmless text file rather than a potentially dangerous computer program written in VBScript.

This issue is becoming less and less serious as the number of attack vectors increases: not only the vast majority of users ignores some of the most obscure dangerous extensions, but files with extensions previously considered safe (like .TXT and .ZIP) have been successfully used as attack vectors; in the case of .TXT, with a file that told users that certain system files were malware and urged to delete them, and, in the case of .ZIP, with an archive from which the user extracted a malicious executable and willingly ran it. It is clearly the responsibility of the e-mail program to warn the user of dangerous attachments, or to block their execution altogether, to stop at least the former kind of attack; handling the latter is entirely a matter of education and training, but its impact can be somewhat mitigated with the application of the principle of least privilege (including, but not limited to, sandboxing). Most programs already provide such protection (notably Eudora, which in the latest Windows versions even extends this functionality to the operating system by means of a shell extension).

Later Windows versions (starting with Windows XP Service Pack 2 and Windows Server 2003) include a customizable database of file types that could be considered dangerous in certain zones (including, but not limited to, downloads from the WWW and e-mail attachments), that applications can query, and standardize a common API to query antivirus programs. These mechanisms are meant to replace the often inconsistent, conflicting or weak mechanisms that existing applications already have in place, hopefully spelling death for nonsense such as certain antivirus software blacklisting scripts as intrinsically dangerous - even more so, in fact, than native executables. The latter approach is actually a cover-up to hide a well-known weakness of blacklist-based (as opposed to heuristic) antivirus software: malware can evade detection by simply "shifting shape" into a semantically equivalent form, becoming different enough from what the antivirus expects to stay undetected. This technique, usually called polymorphism, is a lot easier and more effective with scripting languages. In short, most antivirus software can only block known malware, making them useless against custom (or merely yet unknown) malware.

Relation to Internet MIME types

In network contexts, files are regarded as streams of bits and do not have filenames or filename extensions.

In the internet protocol suite the information about a certain type relating to a certain bitstream is encoded in the MIME Content-type of the stream, represented by a row of text in a block of text preceding the stream, such as:

 Content-type: text/plain

Some operating systems and desktop environments such as BeOS, KDE or GNOME have started using MIME Content-types to tag files with appropriate metadata about the file content type, as a way of getting out of the dependency on filename extensions. Mapping filename extensions to content-types is then done using different heuristics, such as examining both the filename extension and the contents of the file.

See also

External links