Fall 2005»Zipf Intro

Zipf Intro

Zipf's Law

George Kingsley Zipf (1902-1950) was a linguistics professor at Harvard, who died just after publishing his seminal book, Human Behavior and the Principle of Least Effort [1]. In this book, Zipf collected results from various fields that demonstrated an intriguing relationship (or regularity) found in natural phenomena.

Zipf's main contribution was that (a) he was the first to hypothesize that there is a universal principle at play, and (b) he proposed a mathematical formula to describe it. Although his attempts to derive a comprehensive theory were incomplete (and some say misguided), his mathematical formula was pretty accurate.

According to Yale Alumni Magazine, Zipf's work had considerable influence on a young graduate student named Benoit Mandelbrot, who went on to develop the field of Fractal Geometry [2, 4].

Zipf's law models the scaling (fractal) properties of many phenomena in human ecology, including natural language and music [1, 2, 3]. Zipf's law is one of many related laws that describe scaling properties of phenomena studied in the physical, biological, and behavioral sciences. These include Pareto's law, Lotka's law, power laws, Benford's law, Bradford's law, Heaps' law, etc. [4, 5].

Informally, Zipf's law describes phenomena where certain types of events are quite frequent, whereas other types events are rare. For example, in English, short words (e.g., "a", "the") are very frequent, whereas long words (e.g., "anthropomorphologically") are quite rare.

Surprisingly, if we compare a word’s frequency of occurrence with its statistical rank, we notice an inverse relationship: successive word counts are roughly proportional to 1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7, and so on [4]. This is captured by the formula:

P(f) ~ 1/f n

where P(f) denotes the probability of a word (or event) of rank f and n is close to 1.

In physics, Zipf's law is a special case of a power law. When n is 1 (Zipf's ideal), the phenomenon is called 1/f or pink noise. When n is 0 it is called white noise. When n is 2 it is called 1/f2 or brown(ian) noise [6]. Zipf (1/f, pink noise) distributions have been discovered in a wide range of human and naturally occurring phenomena, including music, city sizes, incomes, subroutine calls, earthquake magnitudes, thickness of sediment depositions, clouds, trees, extinctions of species, traffic jams, and visits to websites [1 through 11].

The type of structural regularity captured by Zipf's law can be visualized by plotting such distributions, as demonstrated in figure 1. This graph results in a near straight line with slope near –1.


Fig. 1. Number of unique website hits (y-axis) ordered
by website's statistical rank (x-axis) on log scale [9].

In general, the slope may range from 0 to negative infinity, with –1.0 denoting Zipf's ideal. A slope near 0 indicates a random probability of occurrence (e.g., having y-axis values generated by Math.random()). A slope tending towards negative infinity indicates a monotonous phenomenon (i.e., one event predominates). It has been suggested that a slope near –1.0, corresponds to a balance that feels natural and even aesthetically pleasing to humans, for certain phenomena, such as music, urban structures, and landscapes [3, 7, 10].

Mandelbrot generalized Zipf's law to account for all types of scaling phenomena in nature, as follows:

P(f) ~1/bf n

where b is an arbitrary real constant.

Zipf was independently wealthy; it is believed that he published his last book with his own money. Since electronic computers were unavailable at the time, he collected data by hiring human "computers" to count words in newspapers, books, and periodicals for numerous days at a time [12].

References

  1. Zipf, G.K. (1949). Human Behavior and the Principle of Least Effort, Addison-Wesley Press, New York.
  2. Mandelbrot, B. (1977). Fractal Geometry of Nature, W.H. Freeman and Company, New York.
  3. Voss, R.F., and Clarke, J. (1975). "1/f Noise in Music and Speech", Nature, vol. 258, pp. 317-318.
  4. Bogomolny, A. (on-line). "Benford's Law and Zipf's Law", accessed March 21, 2005.
  5. Li, W. (on-line). "Zipf’s Law", accessed March 22, 2005.
  6. Bourke, P. (1998). "Generating noise with different power spectra laws", accessed October 26, 2006.
  7. Salingaros, N.A., and B.J. West. (1999). "A Universal Rule for the Distribution of Sizes", Environment and Planning B: Planning and Design, vol. 26, pp. 909-923.
  8. Schroeder., M. (1991). Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise, New York: W. H. Freeman and Company.
  9. Adamic, L.A. (on-line). "Zipf, Power-laws, and Pareto - a Ranking Tutorial", accessed March 22, 2005.
  10. Spehar, B., C.W.G. Clifford, B.R. Newell, and R.P. Taylor. (2003). "Universal Aesthetic of Fractals." Computers & Graphics, vol. 27, pp. 813-820.
  11. Nielsen, J. (on-line). "Zipf Curves and Website Popularity", accessed March 22, 2005.
  12. Wallace, R.S. (on-line). "Zipf’s Law", accessed March 22, 2005.