www.nettime.org
Nettime mailing list archives

<nettime> Ippolita Colective: The Dark Side of Google (Chapter 1, first
Patrice Riemens on Wed, 11 Mar 2009 14:01:39 -0400 (EDT)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

<nettime> Ippolita Colective: The Dark Side of Google (Chapter 1, first part)



Hi again Nettimers,

A few additions to my previous mail. This is very much a translation in
progress, in need of further clearing-up. To start with the title: since I
translate in first instance from the French version (published 2008 by
Payot, Paris, transl. Maxime Rovere) I got carried away by 'La Face cachee
...', but Ippolita's own English translation of the title is 'The Dark
Side ...'). As in Dark Side of the Moon of course, though I prefer the
Italian title "Umbre e Lucci..."  - Lights and Shadows (vv) of Google...

Further cabalistic signs:
(...) between brackets as in text
{...} not in text, my addition (suggested)
/.../ in text, but deletion suggested
[...] comments, contentious, or notes
-TR  yours truly (translator)

Enjoy!
patrizio and Diiiinooos!

NB Notes will come later. Most are references, but some are quite
substantive.


.........................................................................
NB this book and translation are published under Creative Commons license
2.0  (Attribution, Non Commercial, Share Alike).
Commercial distribution requires the authorisation of the copyright
holder:
Ippolita Collective and Feltrinelli Editore, Milano (.it)



Ippolita Collective

The Dark Side of Google (continued)


Chapter 1. The History of Search (Engines)


On searches and engines ...

Search engines today come up as websites enabling one to identify and
retrieve information. The mistaken idea according to which the Web and the
Internet are one and the same thing is harboured by the majority of users
because the web represents for them the simplest {, easiest} and most
immediate access to the Internet. But the Net is in reality far more
complex, heterogeneous, and diverse than the Web: it includes chat
functions, newsgroups, e-mail, and all possible other features individuals
wish 'to put on-line', and this no matter of under what format these
informations take shape. To put it differently, the Net is not a static,
but a dynamic whole; resources and their mutual interconnections are
changing constantly, in a kind of birth transformation and death cycle.
The physical connectivity vectors to these resources are also undergoing
constant change. Once upon a time there was the modem connected by coper
phone wires, and today we live in a world of broadband and optic fiber.
And the individuals who shape the Net by projecting their digital alter
ego unto it are also perpetually mutating, at least as long as they stay
{physically} alive. Hence, the Net is not the Web; it is a co-evolutionary
dynamic built up of the complex interaction between various types of
engines ['machines'?]: mechanical machines ({personal}computers, 'pipes',
servers, /modems/, etc. {aka 'hardware'}), biological machines (human
individuals {aka 'wetware'}, and signifying machines (shared resources
{aka 'software'}).

As we shift through the dark mass of information that is the Net, we need
to realise something fundamental, yet uncanny at the same time: the
history of search engines is by way much older than the history of the
Web.

The Web as we know it is the brainchild of Tim Bernes-Lee, Robert Caillau
(*N1), and other European and US scientists. They created the first
'browser' between 1989 and 1991 at the CERN Laboratory in Geneva, together
with the 'http' protocol {Hyper Text Transfer Protocol}, and the 'HTML'
language {Hyper Text Mark-up Language} for writing and visualising
hyper-textual documents, that is documents including 'links' (internal to
the document itself, or external, linking to other, separate documents).
This new technology represented a marked advance within the Internet,
itself a merger between various US academic and military projects.

As the web was still being incubated amongst a number of laboratories and
universities across the world, search engines had been already in
existence for many years as indexation and retrieval services to the
information extant on the Internet.

Obviously, the first search engines could not be looked into on the {not
yet existing} Web: they were rough and straightforward programmes one had
to configure and install on one's own computer. These instruments did the
indexation of resources by way of protocols such as 'FTP' {File Transfer
Protocol} for file-sharing, and 'Gopher' (an early rival of emergent
'http'), and other systems which have gone out of use today.

1994 saw the 'Webcrawler'(*N2) come into operation as a search engine
solely devised for the Web. It was an experimental product developed at
the University of Washington. The innovations this search engines was
bringing along were truly extraordinary. Besides functioning as a web-site
and making it possible to do 'full text' searches (*N3), it also included
a tool, the 'spider' that catalogued web-pages automatically. 'Spider' is
a software programme fulfilling two functions: it memorises the
informations that are on the web-pages it encounters as it navigates
through the Web, and it make these accessible to the users of the search
engine. (More about this will be discussed details in the next chapters).

As unbelievably innovative as it was in its time, Webcrawler was only able
to return simple lists of web addresses as search result together with the
mere headline title of the web pages it listed.

In the last months of the year 1994, a new search engine, Lycos, came up
that was able to index in a very short time 90% of the pages that were
then extant on the World Wide Web (ca. 10 million in all). Lycos'
principal innovation was to do without 'full text' systems, and analyse
only the first 20 lines of the pages it indexed. It allowed Lycos to give
as search result a short synopsis of these pages,  abstracted from these
first 20 lines.

It was with Excite, coming up in December 1995, that search results for
the first time gave a ranking to web pages in accordance with their
importance. Introducing an evaluation system that assigned 'weight' to a
web page constituted an first, rudimentary, step towards a thematic
catalogue: it would at last put an end to interminable lists of disorderly
search results. It made a kind of 'initial checking' possible of a
'directory' of web sites, comparable to a classic library system, with an
indexation according to subject, language, etc. - but then for web
resources.

Apart from that, Excite entered history for another reason: it was the
first search engine equipped with tools that were explicitly geared
towards a commercial activity. After having acquired Webcrawler, Excite
proposed its users personalisation facilities and free mail-boxes,
becoming in less than two years one of the Web's most popular portal
(1997). Yet, Excite dropped its original business model not long after
that, and chose to utilise other firms' search technologies, Google being
foremost among them today (*N4).

This bird's eye view of Google's precursors would not be complete without
mentioning what by 1997 had become the best and most popular search engine
of all: AltaVista. AltaVista ('the view from above') was based on the
findings of a DEC (Digital Equipment Corporation), Palo Alto, California,
research group which had succeeded in 1995 to stock all the words in a
{random} Internet HTML page, in a way precise enough to make a very
refined search possible. DEC had granted AltaVista the further development
of the first data base that could be directly looked up from the World
Wide Web. AltaVista's in-house genius was Louis Monier (*N5). Louis Monier
clustered rows of computers together, made use of the latest hardware, and
worked with the best technologists on the market, to transform his baby
into the most common and best loved search engine of its days. AltaVista
was also the Net's first multi-lingual search engine, and the first with a
technology able to include texts in Chinese, Japanese, and Korean in its
searches. It also introduced the 'Babel Fish' automatic translation
system, which is still in use today.

By the time of its collapse in 1997 (*N6), AltaVista served 25 million
queries a day and received sponsor funding to the tune of US$ 50m a year.
It provided a search facility to the users of Yahoo!'s portal, still today
the principal competitor of Google in the Web sphere.



The Birth of Google. Once upon a time there was a garage, then came the
University ...

The name Google stems from "googol", a mathematical term to describe a 1
followed by 100 noughts. According to the legend, this was the number of
web pages Larry Page and Sergei Brin dreamed of indexing with their new
search engine.

Both met in 1995 at Stanford, when Larry Page, then aged 24 and graduate
of the University of Michigan, came to Stanford intent on enrolling for a
doctorate in computer sciences. Sergei Brin was one of the students
assigned to guide newcomers around the campus. Stanford was (and is)
renowned as the place to develop highly innovative technological projects.
This Californian university is not only a household name for its cutting
edge research laboratories, it also enjoys near-organic links with with
companies in the information technology (IT) sector, and with keen-eyed
venture capitalists ready to sink consequent amounts of cash in the most
promising university research. Brin and Page turned out to be both
fascinated by the mind boggling fast growth of the Web, and with the
concomitant problems related to research and information management. They
jointly went for the 'Backrub' project, which got its name from the 'back
links' it was meant to detect and map on a given web site. Backrub was
re-named Google when it got its own web page in 1997.

The fundamental innovation Google introduced in the search process was to
reverse the page indexation procedure: it did no longer show sites
according to their degree of 'proximity' with regard to the query, but
showed them in a 'correct' order, that is conforming to the users's
expectations. The first link provided should then correspond to the
'exact' answer to the question asked, the following ones slowly receding
from the core of the search question. (*N7)

It is in this perspective that the famous "I am Feeling Lucky" option came
about. Clicking it opens the very first link in the Google results, and is
profiled as the indisputably 'right' one.

The algorithm that calculates the importance of a web page, known as
PageRank[TM] and allegedly 'invented' by Larry Page, is actually based on
statistics of the begin of the 19th Century, and especially the
mathematical works of Andrej Andrejevich Markov, who calculated the
relative respective weight of nodes within a network (*N8)

At its beginnings Google was only an academic scientific project, where
the weight evaluation system was mostly dependant upon the judgments of
'referees' operating within the format of 'peer review'. In theory the
method presenting the best guarantees of objectivity s called the 'double
blind' reading, as is habitually applied to articles before they are
accepted for publication in a scientific review. A contribution is then
submitted to two readers who are reputed scholars in their field; they are
not not know the identity of the article's author (so as not to influence
their judgment). The second 'blind' moment is when the article is being
reviewed for publication, and the reviewer is deemed not to know who the
two referees have been. To sum up, the more positively a scientific
article has been received by fellow scientists (who are supposed to be of
an independent mind), the more the article is deemed to be important and
worth consideration. Page adopts this approach in his research domain, and
applies the theory that states that the number of links to a web page is a
way to evaluate the value of this page, and in a certain sense, its
quality. We will later go into detail as to how this passage from
'quantity' of returned information correlates with the 'quality' of the
results that are expected by the user (*N9).

But this criterion is not sufficient in itself to establish quality, since
links are not equal and do not represent the same value; or to be more
precise: the static value of a link needs to be correlated with the
dynamic value of its trajectory, since the Web is an environment
(mathematically speaking, a graph) where not all trajectories have the
same value: there are varying 'trajectory values' depending upon the
'weight' of the various nodes. And actually, to pursue further the
metaphor relating to the scientific/ academic review process of scientific
articles, not all reviews carry the same weight. A positive advice by
reviewers less prestigious, or worse, by reviewers not very much liked
within the scientific community, can be detrimental to the article being
submitted as too many insufficiently authoritative reviews undermine the
credibility of a publication. Hence, sites that are linked by sites that
are themselves extensively refered to, are according to Page more
important than others {more weakly referenced ones}. In this way, a
trajectory (i.e. a link) that originates from a very popular site carries
much more weight than one coming from a relatively unknown page. This is
how a link from page A to a page B is interpreted in the way of a
scientific referral whose weight is directly in proportion to the
reputation of the reviewer furnishing that link (it should be noted,
however, that Brin & Page explicitly talk in terms of 'vote' and
'democracy' in this regard). The authority of the scientific reviewer
becomes the measure of a site's reputation.

Google's web pages evaluation, known as PageRanking[TM], is thus
elaborated on the basis of a 'public' referral system which is {allegedly}
the equivalent of the way the 'scientific community' (*N10) is operating,
only not limited to scientists but including all the surfers of the World
Wide Web.

Today, the organisational workings of the 'scientific community' and the
issue of data-referencing in general have become crucial problems: in a
context of 'information overflow' (*N11), especially on the Web, it has
become increasingly difficult to estimate not only the importance of
information, but also its trustworthiness, the more so since the very
principle of peer review has been questioned by scientists themselves in
the meanwhile (*N12). Amongst the more interesting alternative options are
rankings based on the number of publications [???], networks of
publications available under copyleft, and 'open access' projects, which
include also research in the domain of the humanities, like Hyperjournal
(*N13).

This was the background when Page launched his 'spider' web-exploring
programme in March 1996 in order to test the page ranking algorithm he has
developed.

The spider-based search engine of the two talented Stanford students
became an instant hit amongst their peers and {more senior} researchers
alike, gaining a wider and extraordinary popularity in the process.
However, the bandwidth usage generated by the search engine quickly became
a headache for Stanford's system administrators. Also, owners of indexed
sites had some qualms about the intellectual property rights pertaining to
their content, and were besides not best pleased by the fact that the
Google's ranking system ran roughshod of more established evaluation
systems, such as prizes and honorary mentions in favor of the number and
quality of links (i.e. popularity) a page was able to garner around it:
Google considers only the relational economy {of sites} expressed in terms
of links, and nothing else. "Spider couldn't care less about the content
of a page".

Hence, the value of a search result must be based on the weight of the
links between two pages, and not on some arbitrary classification enforced
by the terms of the search. This breakthrough turned out to be the key to
Google's subsequent success in the years to come: search results would in
future no longer be fixed once and for all, but would vary dynamically in
accordance with the page's position within the Web as a whole.


Google.com or how ads (discreetly) entered the pages...

Page and Brin went on developing and testing Google for eigthteen months,
making use of free tools provided by the Free and Open Source Software
(F/OSS) community (*N14), and of the GNU/Linux operating system. This
enabled them to build up a system that is both modulable and scalable to
an extremely large extent, which can be augmented and tweaked even while
being fully in use. This modular structure constitutes to-day the basis of
Google's data center, the 'Googleplex'(*N15), and makes possible the
maintenance, upgrade, changes and addition of features and software,
without the need to ever interrupt the service.

By the middle of 1998, Google attended to something like 10.000 queries a
day, and the in-house array of servers Page and Brin had piled up in their
rented room was on the verge of collapse. Finding funds vastly in excess
to what usually is allocated to academic research became therefore a
matter of some urgency.

The story has it that Google's exit from the university is due to a chance
encounter with Andy Bechtolstein, one of the founders of Sun Microsystems
and a talented old hand in the realm of IT. He became Google's maiden
investor to the tune of one lakh US Dollars (100.000 in Indian English -TR
;-)

The birth of Google as a commercial enterprise went together with its
first hires, needed for further developments and maintenance of the data
center. Among them was Greg Silverstein, now the CTO. Right from the
beginnings, Google's data center took the shape of a starkly redundant
system, where data are copied and stored in several places, so as to
minimize any risk of data loss (a move that amounts to print the currency
of search). Its most important feature is the possibility to add or to
remove modules at any given time so as to boost the efficiency of the
system. An other major trump card, as befit university hackers, was Brin's
and Page's habit to recycle and tweak second hand hardware and make
extensive use of F/OSS. Their limited financial resources enabled them to
evolve what would become the core of their business model: nimble
modularity at all levels. The Google-ranger[???]'s modularity means that
it can scale up and down according to need and availability. No need to
reprogram the system when new resources, whether hard- wet- or software,
is added: the highly dynamic structure integrates the new modules, even if
they are stands alone.

Google's formally opened its offices on September 7, 1998 in Menlo Park,
California. As the story goes, Larry Brin opened the doors with a remote,
since the offices were located in a garage, a friend of theirs had
subleted to the firm. A Spartan office-cum-garage then, but one featuring
some not to be spurned comfort components: a washing machine, a dryer, and
a spa. Right from start, Google's company philosophy is about making
employees' life very cushy indeed.

By January 1999, Google left the Stanford Campus for good. The official
statement reads: "Google research project has now become Google Inc. Our
aim is to give to the world searches of a far higher quality than what
exist today, and going for a private company appears to be the best avenue
to achieve this ambition. We have started to hire people and to configure
more servers in order to make our system scalable (we are ordering servers
21 pieces at a time!). We have also started to launch our spider more
frequently, and our results are now not only as fast, they have also a
much better actualisation rate. We employ the most talented people, and
through them we obtain the latest and most performing Web technologies".
Brin and Page then went on for a few more lines to talk about the ten best
reasons to come work for Google, quoting tech features, stock options,
free drinks and snacks, and the satisfaction coming from millions of
people "going to use and enjoy your software".

The years 1998 and 1999 would see all search engines and other popular
sites world-wide in the grip of the 'portal syndrome', a narrow obsession
with developing sites that would attract and retain visitors at all costs
on the site by providing ever more services, ads, and personalisation
gizmo's. Google contrariwise remained the only web instrument without ads
and additional features. It was to remain a search engine pure and simple,
but for that also the best, the fastest, and the one without commercial
ties-up whatsoever.

But the firm could not survive purely on the money given by Bechtolsheim
without generating any substantial profit while at the same time pursuing
its research on identifying and organising information. Displaying a
remarkable aptitude at talking the language of high finance, while
constantly emphasising their commitment to research, Brin and Page then
managed to reach an agreement with California's two topmost venture
capital firms, which astonishingly assented in co-financing together one
and the same company. A totally unique occurrence, seeing two giant
venture capital institutions agreeing to share risks and profits of a
single business proposition. On June 7, 1999, Google was able to announce
that Sequoia Capital and Kleiner Perkins Caufield Byers had granted it US$
2,5 crore in finance capital [N*16] (1 crore = 10 million, in Indian
English -TR ;-).

While one PhD thesis after the other saw the light at Google Inc., its two
researchers-CEOs were looking for avenues to commercialise one way or the
other the mass of indexed data. As a start they tried to sell their search
service to portals by profiling themselves as OEM (Original Equipment
Manufacturer [*N17], [I leave the original authors' explanation out, since
it's rather confused - in the French version at least] but this was not
very successful. But on the other hand, the business model that appeared
to be more compatible to the new firm was one of direct advertisement,
integrated within the search engine itself, and working by way of doing
the count of the number of visitors who access the sites through
commercial advertising links. This business model, called CPT, Cost per
Thousand Impressions [*N18], has a structure that is as little intrusive
as possible for the user. It is not based on flashy advertisement banners,
but relies on discreet, yet very carefully selected links that appear
above the search results. As these links are in a different font and color
than the search results proper, they tend not to be perceived as too
disturbing to the user's search activities.


Self-Service Ads, or Beyond the Dot Com Crash...

A business model based on simple sponsored links appearing alongside
search results does not make very much sense in terms of profit
generation: at this stage, Google's long term commercial strategy was in
need of a qualitative jump. So the presidents came together in search
commercially more promising solutions and came across Goto, a company
founded by Bill Gross (*N19), now owned by Ouverture/Yahoo.

Goto's business model was based on mixing real search results with
sponsored returns, and billing advertisers only if users actually clicked
on their web-address, a format known in the trade as CPC, Cost per Click.

Compared to previous methods, this was particularly innovative. Sponsored
links would only appear if they were functional to the user's actual
search query, thus maximising the likelihood of a transaction to take
place in the form of a click-thru to the commercial site. Google tried to
reach an agreement with Goto, but its CEO's refusal forced it to seek an
alternative, similar solution in-house. At that time, portals (think
Excite, Lycos, infosite, AltaVista and Yahoo) were all using the CPM
format, and CPC was something of a repressed wish. This tends to show that
if you're not able to buy up a superior, mission critical technology from
someone else, you'll have to develop it yourself /in an autonomous
fashion/.

March 2000 saw the implosion of the NASDAQ bubble, sinking in its wake all
pipe-dreams of the 'Dot Com' sphere. With them went also the CPM model,
with its illusion of an unlimited cash flow thru its myriads of ad banners
"with millions of eyeballs" each. However, these banners were most of the
time totally out of context on sites that had nothing to do with the
advertiser's line of business. Google faced at that stage the dire need to
look very closely at its cost/earning accounts, and urgently find a way to
make search technology acquire financial value.

The response came with AdWords, which saw the light in October 1999.
Adwords functions as a sort of advertisement self-service, where
commercial parties could chose the search key-words most likely to be
associated with their own sites. AdWords was Google's application to put
Goto's 'keywords-based advertisement' into effect.

Google hence not only survived the Dot Com bust, it also was able, thanks
to being a not - yet - publicly traded private company, to make good use
of the opportunity to fish right and left for talent beached by all the
other 'dot coms' gone belly up. By mid-2000, Google was answering 18
million queries a day and its document index contained 1 billion unique
items. Six  months later, queries had reached the 60 million mark.


(to be continued)


--------------------------
Translated by Patrice Riemens
This translation project is supported and facilitated by:

The Center for Internet and Society, Bangalore
(http://cis-india.org)
The Tactical Technology Collective, Bangalore Office
(http://www.tacticaltech.org)
Visthar, Dodda Gubbi post, Kothanyur-Bangalore
(http://www.visthar.org)




#  distributed via <nettime>: no commercial use without permission
#  <nettime>  is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: http://mail.kein.org/mailman/listinfo/nettime-l
#  archive: http://www.nettime.org contact: nettime {AT} kein.org