Patrice Riemens on Fri, 27 Mar 2009 05:36:23 -0400 (EDT)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

<nettime> Ippolita Collective: The Dark Face of Google, Chapter 4 (second - and last - part)


NB this book and translation are published under Creative Commons license
2.0  (Attribution, Non Commercial, Share Alike).
Commercial distribution requires the authorisation of the copyright holders:
Ippolita Collective and Feltrinelli Editore, Milano (.it)

Ippolita Collective

The Dark Side of Google (continued)


Chapter 4  Algorithms or Bust! (second part)


>From 'Brand Identity' to 'Participative Interface'

Search, archiving and retrieval of data are procedures so complex that
understanding them fully requires an amount of knowledge and explanation
that are beyond the scope of this book. We will however look in detail a
bit further down at some aspect of their functioning. And we should {in
any case} have a closer look at the interface because that is the element
which is fronted and managed by Google as representing its core image,
whereas algorithm performances and the database architecture are
components that remain invisible to the user.

{In the case of Google,} [T]he interface is mostly the 'blank box'[*N8],
the empty window where the user puts down his/her query, or 'search
intention' on Google's universal homepage, which is designed in such a way
as to exude welcome, reassurance, closeness.

Google's homepage universal functionality stems from it being iterated
across 104 languages and dialects, customizable in 113 different countries
as per today [2007, -TR]. In some [all?] of those, the interaction model
remains the same and unifies all search behaviours into one single,
homogeneous format.

Going to Google's homepage, one first notices a linear interface with key
elements, each with a very specific and universally recognizable function.
This frame will accept search queries of various nature and complexity,
from simple key words (e.g. 'Ippolita') to more complex assemblage of
words in brackets (e.g. "authors collective"), or it will enable to narrow
the search more precisely: to a particular site, or a specific language,
or a particular domain, or only to documents in a specified format, and so
forth, depending on the level of specificity one is aiming at. We can take
it as an example of a successful interface, in so far that it manages to
fulfil the ambitious goal of assigning a positive value to an otherwise
white space in a page. The interface presents itself without any
adornment, almost empty, or rather, filled with one empty element, the
'blank box', which reassures the user, and induces her/ him into activity,
warding of loss of attention {and her/ him leaving the site} due either to
an absence of handles [i.e. something to hold on], or, conversely, because
there are too many visual stimuli. This way, the confusion is avoided
which often go together with pages filled to the brim (suffering
apparently from the 'horror vacui' syndrome), trying to be attractive with
a flurry of banners, graphics, animations, etc, only to communicate
anxiety to the user in the process.

Actually, surfing is not really possible on a Google page: all different
components there have a purely functional purpose. Their goal is to have
the user access a service, not to lead her/ him on a journey; their usage
engenders behaviours which subsequently turn into routines of search, and
become a default mode within a very short time. The interface is designed
in such a way as to make usage, behaviour dynamics, and expectation of the
average user iterative. Thus, even after allowing for the
'personalisation' of an user, search comportments remain basically
identical, so much so that one can speak of a 'universal tool'.

The organisation of texts and images is linear. It uses recurrent graphic
elements, notably primary colors, and the images used are qualitatively
homogeneous. The interface's display style is sober, to the point of
austerity, despite the 'brand (and corporate) identity' the design
reverberates [*N9]. It is informed by a specific aesthetic, which appeals
to very elementary, yet in their simplicity, very effective, properties of
perception.

>From this almost instantaneous visual identification stem a facility of
use far above that of Google's competitors' search engines. The level of
ergonomy achieved by Google is mind boggling: it doesn't even needs to
present itself as a basket of services in its interface, its visual
architecture screams that message already. The different services'
interfaces are all autonomous,  separate and largely independent from each
other, {and} they all carry the 'blank box' as hallmark, /and they are not
directly linked to each other/. It is for instance not necessary to go
through various complicated steps to reach the code.google.com service
dedicated to technicians of all levels, when you come from the site
images.google.com, which addresses a much larger public. You need only to
'go deeper' in the google.com site, and know how to search. Despite this
fragmentation, we are all able to recognize the network of services Google
offers; moreover, visitors are able to make use of the information sources
in a combined and complementary manner. And this equally holds true for
the 'browse-only' types, as for those who have developed a mild - or stark
- addiction to Google's services (a.k.a. 'Google-totally-addicted',
joyfully jumping on the bandwagon of each and every Google novelty)[*N10].

This decentralisation of services results in a particular relational
mechanism, as Google users do not discover these new sectors {not so much}
through Google itself, but {rather} by way of the informal network of
{other} users, on other sites, where Google visitors tell of their habits
and discuss their tastes and preferences. Users then automatically
'localise' themselves within the extensive gamut of Google services,
something that happens as soon as they log in for a new service: for
instance the appropriate language will immediately be offered according to
geographic area {of the user's IP address}. It also becomes easy {for
Google} to approximate the sort of users to which a particular service is
directed, to evaluate what level of technical knowledge it will require,
or to what extent there exist an affinity with other users {of the same
service}. Thus, the ear-say mechanism becomes akin to a
'relationships-based {informal} PageRank' {system}.

A first approximation would be to say that there exists a local relational
dimension, where the ear-say, {'by word of mouth' communication}, concerns
friends and acquaintances, together with a typological dimension of
relationship, which is about particular classes of users, which can be
identified by means of statistical parameters (age, sex, occupation,
etc.), and who use a particular service and thereby kick {a particular
type of} relational economy into being.

It would appear that Google too [also?] does [not] escape {falling victim
to} the ten problems relating to the use of websites discussed by Jakob
Nielsen, one of the most prominent specialist of user interfaces [*N11].
Although written in [{standard?}] HTML language, Google's site is
completely outside standards, and yet manages to be fully readable [i.e.
compatible] with all browsers, whether graphic or linear, in use
today[*N12]. [The second sentence seems to indicate a logical flaw in the
first, so I'd tend to say that JN's critique does _not_ apply to Google,
something the content of *N11 appears to confirm - French text blues ...]

The neat graphic design of the {Google home-}page is further enhanced by
an excellent visual organisation of its commercial aspects. No
advertisement link whatsoever on the home page or in the documentation/
information pages. At Google's, ads are only on display together with the
query returns, but clearly separated from these, although they are related
to the matters the query was about. One can therefore say that Google is
able to arrive, in the agencement of its interfaces, at an acceptable
compromise between the respect due to its users and the necessity of
economic returns. Advertisement, Google's main source of income, are
displayed in such a way as not to be invasive and not distract users in
their usage of Google's services. Advertisement links are sponsored in a
dynamic fashion, adjusting to the user's trajectory within the search site
first, and on the Internet [in general?] second.

Commercial links are thus not static, they move along with the users'
searches. This is made possible by the RSS-feed (for RDF Site Summary, or
Really Simple Syndication), one of the most used formats for the
distribution of web contents, and also thanks to the many different
digital information
(re)sources (dailies, weeklies, press bureaus etc.) Google is using to
dynamically modify its home page, when it has been personalised by a user.
This as Google lets registered users completely configure their Google
start page thought the addition of a RSS-feed, making it possible to have
the weather forecast for cities of one's choice, or to go through the
history of previous searches, and all this at one's fingertips. Bookmark
management, keeping track of the last incoming e-mails all become
possible, but also checks on one's not web-related /computer/ files thanks
to the 'Google Desktop' application.

The commercial promotion mechanism [i.e. ads], the services and
sophisticated profiling of users appear to constitute a coherent whole,
both at the aesthetic and at the content level. And as they clamour
themselves, sponsored links are nothing more than suggestions, though they
are graphically compatible and conceptually cogent with the search
operation in progress. Google's economy is so well integrated with its
interface that it can vanish without harm being done from the vantage
point of  users who are not interested, while generating handsome profit
from users who do show interest in the suggested commercial link-ups.

Yahoo! and many other search engines and portals offer the same sort of
facilities to personalise their home pages. Yet the quality and quantity
of what Google has to offer remains unchallenged till the day of today.
The configurations are rather simple, yet they do require some familiarity
with Web interfaces and need some time to be put to work [by whom? users?
Google itself? unclear in French text]. The treshold of attention on the
Web is notoriously low, pages are visualised and then left within a very
short time-span, {often} just a few seconds, and thus a user who 'invest'a
couple of or even several minutes {in a website}, reveals /through her/
his choices/ a lot about her/ himself and her/ his habits as a consumer.
These informations are then carefully memorised by the company {owning the
search engine} (whether that is Google!, Yahoo! {or another firm}) and
represent a true source of wealth produced {cost-free} by the user her/
himself. They are essential to the sponsoring companies offering targeted
products and services.

Home page personalisation makes a site more attractive and intimate: the
site itself becomes some sort of private tool in which the user goes on
investing time by choosing colors, tweaking its outlook, and selecting
her/ his {favourite} content. A recurrent [habitual] visitor who is able
to configure her/ his start page participates in the construction of the
web interface. Giving the user the freedom of choice and control over a
few pages means transforming her/ him from a simple target of
advertisement into an 'intelligent' consumer {that is one you can extract
'intelligence' from}. To foster interaction is surely the best and yet
subtlest way to achieve 'fidelity'. This is why one sees the
multiplication of participative interface environments, where ads are
increasingly personalised, in order to let us all enter together into the
golden world of Google.


PageRank[TM] or the absolute authority within a closed world

The algorithm that enables Google to assign value to the pages its
'spider' indexes is known as 'PageRank'[TM].

We have already seen that PageRank[TM]'s mode of functioning is based on
the 'popularity' of a web page, computed on basis of the number of sites
that link to it. Given an equal number of links, two different web pages
will have a different PageRank[TM], according to the 'weight' of the
linking pages: this constitutes the 'qualitative' aspect of sites.

To take a concrete example: quite often, when one checks out the access
stats of a site, one encounters an enormous number of link-ups coming from
pornographic sites. This is due to the fact that Google assigns ranking
according to accessing links which appear in public statistics. There are
therefore programs exploiting the invasive aspect of this connexion and
node evaluation logic in order to push up the ranking. And pornographic
sites are well-known to be pioneers in {this kind of smart} experiments
(they were the first on the Web with image galleries, and with on-line
payment models).

As a number of [spider?] programmes are looking for sites with the help of
public access statistics, a very large number of links  are actually
established through bogus visits. These come from a fake link on another
site, which is most often pornographic. This devious mechanism literally
explodes the number of access to that site, causing its statistics to
swell, and its (Google) ranking to lift up, which in last instance
benefits the pornographic site issuing the fake link in the first place.
It looks like a win-win situation, at least where visibility is concerned.
And it is not an 'illegal operation' [;-)] either: nothing forbids linking
up to an Internet site. This practice causes sites with public statistics
to have a higher ranking {than non-public stats sites}.

This mechanism illustrates how Google's ranking's 'technological magic'
and 'objectivity' are actually connected to the 'underground' of the Net,
and is {partially} grounded on less savoury practices.

Other {perfectly} legit practices have been documented that exploit
Google's approach to indexation, such as Search Engine Optimization (SOE),
a suite of operations pushing up the ranking of a website in search
returns. Getting to the #1 position, for instance, is often achievable
through spamming /from out improbable addresses by automatic programmes,
with stupendous effects/.

"We register your site with 910 search engines, registries and
web-catalogues! We bring your site in pole position on Google and Yahoo!
Try Us! No risk, just US$299 instead of US$349! - one shot, no
obligations!". Of course {, confronted to this,} Google still plays the
transparency card: "nothing can guarantee that your site will appear as #1
on Google" [*N14].

Mathematically speaking, a feature of PageRank[TM], which is based on the
analysis of links, is that the data base must be {fully} integrated, or
with other words, that the search operations can only take place within a
circumscribed, albeit extremely vast, space. That means that there is
always a road that leads from one indexed web page to another indexed web
page - in Google's universe, that is.

Searches therefore, will tend to be functional, by avoiding 'broken links'
as much as possible, and also returns that are substantially different
from what had been archived in the 'cache memory'. But the ensuing problem
is that users will be falsely made to believe that Internet is a closed
world, entirely made up of {transparent} links, without secret paths and
preferential trajectories, because it would seem that, starting from any
given query, a 'correct' response will always be returned.

This is the consequence of the 'Googolian' view of the Internet as
exclusively made up of the journeys its own spider accomplishes jumping
from one web link to the next. If a web page is not linked anywhere, it
will never appear to a user since Google's spider had no opportunity find,
weight and index it. But this does in no way mean that such things as
'data islands {on the Net}'do not exist!

Dynamic sites are a good example of this as their functionalities are
entirely based on the choices the user has made. A typical instance is the
[abysmally faktap - TR] site http://voyages.sncf.com {owned by the French
Railways}. Filling in the appropriate form gives you train times, onward
connections, fastest itineraries etc. for any given destination in real
time [or so they say - Good Luck! -TR (uses bahn.de or sbb.ch instead,
esp. for domestic France trips)]. Google's system is unable to grasp these
forms' queries and hence does not index the results that have been
dynamically created by {the site} voyages.sncf.com . Only a human person
can overcome this hurdle. The only solution Google is able to provide is
to redirect the user to the rail companies' or airlines' own sites when an
itinerary, time table or destination is asked for.

This is the reason why the idea of the exhaustiveness of Google's data
bases must be challenged and  discounted, as these {falsely} conjure up
the notion of one unique universe for all of us, which it is complete and
closed {and is called 'Google'}. Quite the contrary is the case, as the
act of mapping a trajectory in a complex network always means an
exploration with {only} relative and partial results.

The dream of a Google "which has a total knowledge of the Internet" is
demagogic nonsense whose sole aim is to perpetuate the idea that the
information provided is accurate and exhaustive, elevating Google, as it
were, into a unique, truth dispensing service {- the Internet equivalent
of the One Party System}. Such an absolute fencing-off works admittedly
well in everyday searches, because it leads speedily to results. But in
reality, within a complex networked system, there is no such thing as an
absolute truth, but only a trajectory-induced evaluation, or even a
time-induced one, depending on how long one wishes to spend on a
(re)search. The quality {of a search} is {also} entirely dependent on the
subjective perception we have of the returns, considered as acceptable, or
less so. The networks we are able to explore, analyse and experience are
complex objects whose nodes and linkages are constantly shifting. And if
the decision as to find the results of a search acceptable {or not} depend
on the user in last instance, then the exercise of critical faculties is
essential, together with a sharp realisation of the subjectivity of her/
his own viewpoint. In order to generate a trajectory that is truly worth
analysing, it is necessary to presuppose the existence of a limited and
closed network, of a world made up only of our own personal exigencies,
yet at the same time knowing full well that this is a subjective
localisation, neither absolute, nor remaining the same in time. To explore
the Net means to be able to carve the Net up in smaller sub-nets for the
sake of analysis; it amounts to creating small, localised and temporary
worlds [*N15].

It turns out that in everyday practice, chance linkages are of utmost
importance: the emergence of new and unexpected relationships can by no
means be predicted by the analysis of the web's separate elements, such as
Google's ranking {system} suggests. These linkages fulfill the function of
'dimensional gateways' and allow for the lessening, or even the rank
abolition, of distances between two nodes /in the network/.


PageRank[TM]: science's currency?

Contrary to common belief, the PageRank[TM] algorithm is not an original
discovery by Google, but is based on the works of the Russian statistical
mathematician Andrej Andreievich Markov, who analysed statistical
phenomena in closed systems at the beginning of the 20th Century. Closed
systems are understood as ones where each and every element is by
necessity either the cause or the outcome of (an) other element(s) in that
system [*N16]. Sergei Brin's and Larry Page's work must have been based on
this /theory/, although the further advances they made therein have not
entirely been publicly disclosed, aside from the Stanford patent [assuming
that that is public - the French text, convoluted and unclear, would
almost say the opposite].

Maybe the best way to understand the nature of this algorithm is to look
at what happens between friends. In a community of friends the more one
talks about one shared event or experience, the more it grows in
importance, to the point of becoming something of common lore [here again,
the French text lets me down, talks of ... "password between friends" ...]
If the knowledge about this given event is confined to a narrow circle, it
will not become very famous. The same logic applies to celebrities /in the
show business/. The more they manage to be talked about, the more their
ranking rises, the more famous they are, the more they become celebrities
(this is the reason why there are so-many self-referential shows on
television, like "Celebrity Farm" and others.) Google puts exactly the
same mechanism to work in handling data.

But Google is much more convincing in its image management by spreading
the idea that Internet should be seen as a vast democracy, since the
algorithm functions as if links were votes in favor of sites. And it
doesn't matter whether the link speaks good or bad about a site, the
important thing is to be spoken about {i.e. linked}. The deception
inherent to this 'global democracy' arrived at by an algorithm is
immediately obvious: as if democracy was something coming out of
technology and not of the practices of human individuals! We have already
stressed [*N17] that the cultural origins of such a worldview stem from
the extremely elitist peer review system as practiced by scientific
publications, where each researcher's contribution fits into a network of
relationships, of evaluations and verifications enabling the communication
and control of scientific research results. Google's 'global democracy'
hence amounts to transfering the 'scientific method' /of publishing/ on
the Web by way of the PageRank[TM] algorithm, functioning as
'technological referee' which is able to objectively weight the
informations on the web and to order them according to the choices
expressed through their links by the 'People of the Net'.

The likeliness is striking: on one hand we have scientific publication
which acquire influence and authority in accordance to their ranking
within their particular discipline, and this ranking is obtained by way of
citations {('quotes')}, that is by being cross referenced in the
specialised literature. This is the way scientific research guarantees
coherence: by ensuring that no new publication exits in a void, but
function as the 'current art' within the long history of scientific
endeavour. And then on the other hand, we have web pages whose links are
taken by Google's spider as if they were as many 'citations' which
increase the status, and hence the ranking of these pages.

Scientific elitism, the prime mover of the awe which 'science' inspires,
is curiously based on publication. Publishing by the way, i.e. making
public, does by no means mean making 'accessible' or 'understandable'
[*N18]. Indeed, it was the contention of sociologist Robert Merton in the
seventies {of the previous century} that 'scientific discoveries', whether
theoretical or experimental, cannot, will not, and should not be
considered truly scientific unless they have been 'permanently integrated
{into the body of scientific knowledge}'[*N19]. This statement might
appear somewhat apodictic (after all, science in Antiquity was not at all
'publicly' transmitted - think of the Pythagorean school in ancient
Greece, or of the distinction made between 'esoteric' and 'exoteric'
writings, etc), but it does clearly evidence the eminently public
character of modern day science. Communication hence is not a derived
product of research, but the integral part of a form of knowledge based on
accumulation and co-operation. Science, at least since the 16th Century,
on one hand strives for new results which would constitute an augmentation
of the cognitive capital, but recognises on the other previous research as
the {necessary and unavoidable} departure point of those. One can
therefore initiate a history of scientific communication which would
develop in parallel with that of its media  supports: from the voluminous
correspondence scientists used to maintain with each others, through the
periodical publications in scientific reviews up to the electronic
communication carriers of today. And it is not fortuitous that the first
Internet nodes were academic research centers /which had the need to
communicate and share information/.

Nonetheless, the evolution of carriers did not influence the basic tenets
of the scientific method's mode of communication, which remains based on
citations. Dubbed 'science's currency' in some quarters, citations
function as tokens of honour given by scientists to the people who taught
and/or inspired them. More concretely it links present to past research,
whether from the same, or from different authors. And it makes indeed
sense to consider that the number of citations {('quotes')} a certain
piece of research has attracted reflects its importance, or at least its
impact, on the scientific community. With time, this system has become
itself the object of specific research: bibliometrical analysis is a
discipline which uses mathematical and statistical models to analyse the
way information is disseminated, especially in the field of publications.
In fact bibliometry, and then especially its best-known indicator, the
'impact factor'[*N20] is being commonly used as an 'objective' criterion
to measure an individual researcher's scientific output or that of an
{academic} institution. A vast archive of bibliometric data has been put
on line in 1993 - at Stanford precisely, the cradle of Google. The SPIRES
Project (for Stanford Public Information Retrieval System) was born in
1974 out of the series of bibliographical notes about articles on high
energy physics established by the library of Stanford University. Because
its domain of analysis is limited {and well-defined}, SPIRES is an
exhaustive, publicly accessible, and free database, making  complex
searches possible on the body of citations. It is likely that Brin and
Page were able to study and emulate this methodology when developing their
own PageRank[TM] system.

But besides this algorithm itself, there are more adaptive features which
have contributed to make Google a true 'global mediator' of the World Wide
Web.

END of Chapter 4


(to be continued)


--------------------------
Translated by Patrice Riemens
This translation project is supported and facilitated by:

The Center for Internet and Society, Bangalore
(http://cis-india.org)
The Tactical Technology Collective, Bangalore Office
(http://www.tacticaltech.org)
Visthar, Dodda Gubbi post, Kothanyur-Bangalore
(http://www.visthar.org)







#  distributed via <nettime>: no commercial use without permission
#  <nettime>  is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: http://mail.kein.org/mailman/listinfo/nettime-l
#  archive: http://www.nettime.org contact: nettime@kein.org