Encyclopedia  |   World Factbook  |   World Flags  |   Reference Tables  |   List of Lists     
   Academic Disciplines  |   Historical Timeline  |   Themed Timelines  |   Biographies  |   How-Tos     
Sponsor by The Tattoo Collection
Benchmark
Main Page | See live article | Alphabetical index

Benchmark

A benchmark is a point of reference for a measurement. The term presumably originates from the practice of making dimensional height measurements of an object on a workbench using a graduated scale or similar tool, and using the surface of the workbench as the origin for the measurements.


In surveying, benchmarks are landmarks of reliable, precisely-known altitude, and are often man-made objects, such as features of permanent structures that are unlikely to change, or special-purpose "monuments", which are typically small concrete obelisks, approximately 3 feet tall and 1 foot at the base, set permanently into the earth.


In computing, a benchmark is the result of running a computer program, or a set of programs, in order to assess the relative performance of an object, by running a number of standard tests and trials against it. The term is also commonly used for specially-designed benchmarking programs themselves. Benchmarking is usually associated with assessing performance characteristics of computer hardware, e.g., the floating point operation performance of a CPU, but there are circumstances when the technique is also applicable to software. Software benchmarks are, for example, run against compilers or database management systems.

Benchmarks provide a method of comparing the performance of various subsystems across different chip/system architectures.

As computer architecture advanced, it became more and more difficult to compare the performance of various computer systems simply by looking at their specifications. Therefore, tests were developed that could be performed on different systems, allowing the results from these tests to be compared across different architectures. For example, Intel Pentium 4 processors have a higher hertz rating than AMD Athlon XP processors for the same computational speed, in other words a 'slower' AMD processors could be as fast on benchmark tests as a higher hertz rated Intel processors.

Benchmarks are designed to mimic a particular type of workload on a component or system. "Synthetic" benchmarks do this by specially-created programs that impose the workload on the component. "Application" benchmarks, instead, run actual real-world programs on the system. Whilst application benchmarks usually give a much better measure of real-world performance on a given system, synthetic benchmarks still have their use for testing out individual components, like a hard disk or networking device.

Computer manufacturers have a long history of trying to set up their systems to give unrealistically high performance on benchmark tests that is not replicated in real usage. For instance, during the 1980s some compilers could detect a specific mathematical operation used in a well-known floating-point benchmark and replace the operation with a mathematically-equivalent operation that was much faster. However, such a transformation was rarely useful outside the benchmark. Manufacturers commonly report only those benchmarks (or aspects of benchmarks) that show their products in the best light. They also have been known to mis-represent the significance of benchmarks, again to show their products in the best possible light. Taken together, these practices are called bench-marketing.

Users are recommended to take benchmarks, particularly those provided by manufacturers themselves, with ample quantities of salt. If performance is really critical, the only benchmark that matters is the actual workload that the system is to be used for. If that is not possible, benchmarks that resemble real workloads as closely as possible should be used, and even then used with skepticism. It is quite possible for system A to outperform system B when running program "furble" on workload X (the workload in the benchmark), and the order to be reversed with the same program on your own workload.

Some common benchmarks are: