www.nettime.org
Nettime mailing list archives

<nettime> statistical models and society
Keith Hart on Fri, 11 Jul 2003 23:24:59 +0200 (CEST)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

<nettime> statistical models and society


The following is a brief summary of a paper I have begun to work on, "From
bell-curve to power-law: statistical models between national and world
society". It is to be presented at the Association of Social
Anthropologists Decennial Conference on Anthropology and Science, held at
Manchester, 14-18th July, in a panel entitled "Making and abstracting
numbers: the culture and politics of counting".

The summary stretches my competence and relies perhaps too heavily on A-L
Barabasi's Linked: the new science of networks (Perseus, 2002). I would be
glad of any suggestions for further reading.


                                                                           
   *    *     *     *     *     *


Statistical patterns can be found empirically in nature and society. Their
distribution may even conform to mathematical models. Thus, if two unbiased
dice are rolled a thousand times, the number seven will occur with roughly
six times the frequency of two or twelve. The resulting histogram will be
symmetrical with one peak where the mean, median and mode coincide. Or take
a large sample of adult human beings and measure their height. Most cases
will fall between five and six feet with very few less than four or more
than seven feet. Because this is a continuous variable, the results can be
plotted on a graph to which a curve may be fitted. It too will have a
single peak with fan tails on the high and low ends. We call this the
normal distribution or popularly "the bell-curve". For more than a century
statistical inference has largely been based on this curve with its
parameters of mean and standard deviation.

More recently, another statistical pattern has been making the headlines.
If you score the number of hits on 7,000 websites in a given day and plot
them by size and frequency, the curve hugs the vertical and horizontal
axes, indicating few very large numbers and many small numbers. If the same
data are plotted on a log-log scale, the result is a straight line sloping
down from left to right. This is a typical manifestation of something
called a "power-law". A similar formula appears to describe the frequency
of words used in natural language; and even the distribution of molecular
reactions in cells reveals a few hubs linked to most reactions and many
weakly connected molecules. "The new science of networks", basing its
statistical approach on the physics of complexity, has been announced by,
among others, Albert-Laszlo Barabasi in a recent popular book Linked
(2002). Just as, in the late 19th century, the normal distribution seemed
to lend unity to statistical patterns emerging in a number of apparently
unrelated fields, such as criminology, astronomy and plant genetics, now
the power-law appears in fields as disparate as the internet, stock
markets, air transport, Hollywood actors' networks, power grids, urban
hierarchies and molecular biology.

In this brief speculative exercise, I want to explore the possibility that
the forms through which we perceive order in the world are derived in part
from our experience of society. This is not to deny the empirical
occurrence of phenomena that lend weight to the mathematical models used by
statisticians, but rather to suggest that their relative prominence in our
collective imagination reflects the way we experience society at different
times. This is to revive the proposition made by Durkheim and Mauss in
Primitive Classification (1905) that cultural forms are social in origin.
They supported this claim with reference to Australian totemism, the
classification of animals corresponding to clan organization, and to
Chinese astrology which reflected the hierarchical organization of that
society. Nearer to home, it could be said that Darwin's scheme of
evolutionary biology shared many features with the Victorian capitalism of
his day, the individualism, natural selection as market competition and so
on.

So I wish to explore here whether the recent rise to prominence of the
power-law distribution, with its premise of extreme inequality, tells us
something about our collective experience of society at this time. In
particular, I will argue that the normal distribution or bell-curve was
very well-suited to the egalitarian and democratic premises of the
nation-state form that came to dominate society, at the same time as
probabilistic thinking enshrined the bell-curve at the centre of its
practice. The power-law has been known for much longer than the decade or
so in which it has achieved greater prominence. In the 19th century, when
urban economy was not yet fully subsumed under the logic of nation-states,
power-laws were discerned in the dramatically uneven growth of cities.
Later both Zipf and Pareto proposed something similar in the form of
rank-order distributions, the one for word frequencies and the other for
income distribution. Pareto is credited with discovering the 20/80 rule --
the idea that 20% of the people own 80% of the wealth or 20% of journal
articles account for 80% of the citations. But the premise of inequality
contained in this rule was not adapted to the ideology of mid-20th century
society and it remained a marginal anomaly.

Statistics arose in the mid 19th century as a way of regulating people
through enumerating them. Towards the end of the century the growing
influence of probabilistic thinking (Hacking, The Taming of Chance) began
to reveal some regular statistical patterns that could be applied as models
to a series of apparently disparate phenomena. The one that attracted the
greatest interest was the frequency distribution we know as the bell-curve.
In the 20th century this model was the basis for the development of
"parametric" statistics, the mainstream approach to statistical inference.
The very word normal says it all -- conformity to a standard revealed by a
central tendency, meaning that a population can be described in terms of an
average type. The key assumption is randomness. This means that every
member of a group has an equal chance of being selected. The democratic
premise is obvious. This is an egalitarian as well as an atomistic model.
Moreover, the quantities have to be measured on an interval scale, so that
size is a continuous variable, not broken up into the separate categories
of nominal or ordinal scales. 

It is my hypothesis that this image of the natural and social world gained
credibility from reflecting the premises of the national societies formed
in the second half of the 19th century. In the 20th, the nation-state was
the modal form of society. Anthropologists transposed its basic assumptions
to ethnographic descriptions of so-called primitive societies, thereby
demonstrating that the model of cultural homogeneity was universal. The
power-law is characterized by a few very large quantities and very many
small ones. The curve reflects an exponential rate of growth. In the case
of the rapidly growing field of network science, it is commonly observed
that there are a few hubs with very many links and a large number of
weakly-connected nodes. The discovery of power-laws is related to the
physics of complexity, the attempt to study interconnectivity in a
non-reductionist way (as opposed to the isolated atoms of the random
universe). This science is mainly concerned with the construction of order
out of chaos and with the properties of transitional phases, as when
chaotic water molecules assume the rigid pattern of ice.
Network theory in social science arose in the 1950s as a result of the
development of graph theory in mathematics. This theory was based on a
number of assumptions that have since come to be seen as unrealistic. The
model described an inventory of nodes whose number is fixed and remains
unchanged throughout the life of the network. Second, all nodes are
equivalent, so that they can only be linked together randomly. These
assumptions of randomness, stasis and equivalence were unquestioned for
forty years. Territorial society lent some credibility to networks
configured in its own image. Thus road maps do not diverge markedly from
the model, each centre having roughly the same number of links to the
others. 

Stanley Milgram conducted an experiment in 1967 to see how many personal
links would be needed to connect any two randomly selected individuals in
the United States. He found the median number of links was 5.5 and this
gave rise to the popular idea of "six degrees of separation", that all
humanity is connected on average by six links. This "small world"
phenomenon does not sit well with the assumptions of a random universe.
Then it was discovered that most Hollywood actors, as measured by
appearances in the same film, were linked by two or three degrees to Kevin
Bacon (who turned out later to be far from the best connected of actors).
Mark Granovetter established in 1973 that distance in networks was reduced
by weak links between clusters. And the typical clustering of  networks was
modeled by Watts and Strogatz in 1998. But until now the basic assumptions
of original graph theory still held. The key shift emerged with the
recognition that some nodes in networks are hubs and some persons are
"connectors" (Gladwell The Tipping Point, 2000). People vary enormously in
their ability to make social connection and in this they resemble the air
traffic grid of the United States, with a few O'Hares and many small
airports. By now networks were coming to seen as both intrinsically unequal
in the size distribution of nodes and dependent on a few highly connected
individuals. But what produces this effect?

Barabasi and his collaborators at Notre Dame in the late 90s established
the fit between the pattern of internet links and the power-law
distribution. This led them to characterize such networks as "scale-free",
unlike the central tendency and standard deviation of the normal
distribution. There is no characteristic node in a continuous hierarchy
such as that typical of the power-law. The exponential character of the
curve reflects the fact that networks grow over time and the skewed
distribution of links may be accounted for by "preferential attachment".
Growth with preferences both accounts for the hub phenomenon (early comers
tend to attract more links) and requires us to abandon graph theory's key
assumptions, of randomness, stasis and equivalence. The rule appears to be
consistent with the market principle that "the rich get richer". Indeed in
the network economy, as the Microsoft case confirms, it can even be
summarized as "winner takes all". The winner in any network is often
unpredictable until one node crosses a threshold and takes off. The trick
is to find the threshold. When hubs are undermined, the network as a whole
is often visited by "cascading failure".

It is clear that the convergence of world markets and the internet has
multiplied opportunities for scale-free networks. If corporate hierarchy
was well-suited to the era of mass production for national markets, the
rise of a web or network model of economy involves a shift from vertical to
flat virtual integration, as Castells (The Internet Galaxy, 2000) has long
insisted. The detachment of the money circuit from real production and
trade (Hart, Money in an Unequal World, 2001) has accelerated recognition
that the market is a weighted and directed network, with the mass of
ordinary stocks following a few market leaders. Already the power-law has
been harnessed to predictive models based on analysis of the movement of
the eight or so main stocks in a given sector.

"Nature normally hates power-laws", says Barabasi who has done more than
most to promote their visibility. Hitherto physicists have found them most
often near the critical point of phase transitions, as when a metal is
magnetized by heat. The bell-curve is empirically preponderant in the
natural world, we are told. Interestingly enough, the Americans have long
held that income inequality is inevitable, while the Europeans have tended
to deny it. Today webloggers or peer-to-peer activists, the radical
democratic wing of internet society, accept the fact of the power-law and
claim that as long as choices can made freely (equal opportunity), this
inequality is acceptable, one might say normal. Even if it can be shown to
be regular, exponential growth is unpredictable. Statistical physicists can
only say that sometimes a variable crosses a threshold and then it takes
off into the curve described by a power-law. The stakes are high, but
anyone can play.

This whole paradigm shift in scientific and statistical models seems to
coincide with the breakdown of the nation-state as the monopolistic
framework of society and with it of the corporate premises of 20th century
economy, jobs for life and all that. Since the late 70s the neo-liberal
consensus has valorized global markets over national economy and the
digital revolution of the 90s has given us a new emergent model of society
in the network of networks, the internet. This new world market in
commodities and information has revealed stark inequality as the norm.
Winner takes all is now understood to be a general principle. The
egalitarian premises of nation-states, seeking to curb the polarizing
tendencies of markets and capitalism, have given way to an emergent world
society in which the rich get richer is now taken to be axiomatic. This may
be a transitional stage on the way to a new world order capable of curbing
the natural excesses of the market. But for now the power-law is king. It's
a different model of statistics, for sure. Perhaps it captures society
poised between national and world forms. Or society between state and
market, having reverted to a balance between the two more like that of  the
mid-19th century, before national regulation aspired to curb the domestic
excesses of capitalism. The question before us is whether new political
forms will enable humanity to curb the polarities of the network economy or
market. 

No-one denies that there is an objective basis for these statistical
phenomena in nature and society. But the model that attracts most attention
in a given period is likely to reflect underlying tendencies in social
experience. Having been raised in the heyday of British social democracy,
only to face the new liberalism now, I feel like I have had to undergo
several radical paradigm shifts. The models of statistical distribution I
have discussed briefly here serve as one way of talking about this
momentous transition in society and its cultural forms of expression.

Keith Hart

Manchester, 15th July 2003

#  distributed via <nettime>: no commercial use without permission
#  <nettime> is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: majordomo {AT} bbs.thing.net and "info nettime-l" in the msg body
#  archive: http://www.nettime.org contact: nettime {AT} bbs.thing.net