George Kingsley Zipf (1902-1950) was a linguistics professor at Harvard, who died just after publishing his seminal book, Human Behavior and the Principle of Least Effort . In this book, Zipf collected results from various fields that demonstrated an intriguing relationship (or regularity) found in natural phenomena.
Zipf's main contribution was that (a) he was the first to hypothesize that there is a universal principle at play, and (b) he proposed a mathematical formula to describe it. Although his attempts to derive a comprehensive theory were incomplete (and some say misguided), his mathematical formula was pretty accurate.
Zipf's law models the scaling (fractal) properties of many phenomena in human ecology, including natural language and music [1, 2, 3]. Zipf's law is one of many related laws that describe scaling properties of phenomena studied in the physical, biological, and behavioral sciences. These include Pareto's law, Lotka's law, power laws, Benford's law, Bradford's law, Heaps' law, etc. [4, 5].
Informally, Zipf's law describes phenomena where certain types of events are quite frequent, whereas other types events are rare. For example, in English, short words (e.g., "a", "the") are very frequent, whereas long words (e.g., "anthropomorphologically") are quite rare.
Surprisingly, if we compare a word’s frequency of occurrence with its statistical rank, we notice an inverse relationship: successive word counts are roughly proportional to 1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7, and so on . This is captured by the formula:
where P(f) denotes the probability of a word (or event) of rank f and n is close to 1.
In physics, Zipf's law is a special case of a power law. When n is 1 (Zipf's ideal), the phenomenon is called 1/f or pink noise. When n is 0 it is called white noise. When n is 2 it is called 1/f2 or brown(ian) noise . Zipf (1/f, pink noise) distributions have been discovered in a wide range of human and naturally occurring phenomena, including music, city sizes, incomes, subroutine calls, earthquake magnitudes, thickness of sediment depositions, clouds, trees, extinctions of species, traffic jams, and visits to websites [1 through 11].
The type of structural regularity captured by Zipf's law can be visualized by plotting such distributions, as demonstrated in figure 1. This graph results in a near straight line with slope near –1.
In general, the slope may range from 0 to negative infinity, with –1.0 denoting Zipf's ideal. A slope near 0 indicates a random probability of occurrence (e.g., having y-axis values generated by
Math.random()). A slope tending towards negative infinity indicates a monotonous phenomenon (i.e., one event predominates). It has been suggested that a slope near –1.0, corresponds to a balance that feels natural and even aesthetically pleasing to humans, for certain phenomena, such as music, urban structures, and landscapes [3, 7, 10].
Mandelbrot generalized Zipf's law to account for all types of scaling phenomena in nature, as follows:
where b is an arbitrary real constant.
Zipf was independently wealthy; it is believed that he published his last book with his own money. Since electronic computers were unavailable at the time, he collected data by hiring human "computers" to count words in newspapers, books, and periodicals for numerous days at a time .