Florian Cramer on Wed, 19 Dec 2007 03:24:50 +0100 (CET)

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

<nettime> Critique of the "Semantic Web"

[This is a lecture manuscript written for the "Quaero Forum" on the
politics and culture of search engines at Jan van Eyck Academy
Maastricht, 9/2007 - it's still a bit rough; thanks to Felix Stalder for
useful corrections and his suggestion to post it here. -F]

                       Animals that Belong to the Emperor

   Failing universal classification schemes from Aristotle to the Semantic

   Quaero Forum, Maastricht

  Florian Cramer


   The weapon with which state-subsidized European search technology
   projects allegedly intend to beat Google is semantic information
   processing: pattern recognition in media file in the French Quaero
   project, Semantic Web technology in Theseus, its German off-spring.
   Originally, Quaero was a French-German collaboration, funded by both
   governments, until the German Theseus project split off from Quaero to
   pursue its own vision of future Web search. This vision is twofold,
   involving a number of classic holy grails of computer science:

    1. to provide search on the basis of Semantic Web meta tags,
    2. to have software recognize the contents of web pages in order to
       automatically apply those tags.

   While the second point is utopian enough and something that Artificial
   Intelligence research failed to achieve for decades, even the first
   point, the universal nomenclature of semantic tagging known as the
   Semantic Web, is doomed to fail by any critical standard of cultural
   reflection. The reason why the Theseus project nevertheless receives
   high public funding is economic and political, but, with its stated
   goals, hardly related to anything resembling a working web search

   Founded and pursued by Tim Berners-Lee, the original architect of the
   World Wide Web, the "Semantic Web" is a term and project that is not
   only prone to major confusion, but also emblematic of how the
   alienation between engineering and humanities goes both ways:
   shockingly naive and simplistic understandings of cultural concepts
   among the former, and a complete misunderstanding of the "Semantic Web"
   among the latter because its terminology of "semantics" and
   "ontologies" is plainly weird or mystifying outside computer science.
   In 2004, prior to Quaero and Theseus, the German federal government
   subsidized research on the Semantic Web with 13.7 million Euro,
   reasoning that as a "semantic technology", it would allow people to
   phrase search terms as normal questions, thus giving computer
   illiterates easier access to the Internet. But the Semantic Web is
   actually not about this at all; the funding was, in another words, a
   13.7 million Euro misunderstanding. {1}

   Natural language question parsing indeed is another holy grail of
   Artificial Intelligence research, parodied by Weizenbaum's "Eliza", and
   tried by Web search engines from "Ask Jeeves" - which renamed itself
   Ask.com after deemphasizing its original concept - to "Powerset",
   recently brought up by Geert Lovink on the Nettime mailing list.{2}
   Full semantic natural language understanding falls into the previously
   mentioned second category, the nut that "hard" A.I. research has
   claimed over decades to have almost, but just not quite cracked, while
   critical A.I. researchers like Luc Steels claim that it cannot be
   reached with current computer architectures regardless their speed. In
   search engine reality, natural language search systems boil down to
   nothing more but inefficient interface wrappers around Boolean search
   expressions with their logical AND, OR and NOT operators.

   The Semantic Web does not fall into this trap because it does not
   involve any automatic interpretation of meaning. Instead, Berners-Lee
   insists that his project "does not imply some magical artificial
   intelligence which allows machines to comprehend human mumblings"{3}
   - in sharp contradiction to the stated goal of the Theseus project.
   Instead, he conceives of the Semantic Web as a universal, unified
   markup or "meta tagging" system: "Instead of asking machines to
   understand people's language, it involves asking people to make the
   extra effort".

   This effort, semantic tagging, is a well-established and popular device
   on sites like the photo sharing platform flickr.com, the news
   aggregator digg.com and the bookmarking site del.icio.us. It simply
   means that users attach keywords to texts, images and other resources,
   making the information searchable by keywords or particular keyword
   combinations. On Flickr, for example, the search keyword combination
   "birthday", "children" and "clown" results in a list of pictures of
   clowns appearing at children's birthday parties - not because of any
   Quaero-style computer recognition of the image contents and
   Theseus-style automatic keyword mapping, but because the keywords had
   been manually assigned to these images by Flickr users.

   While such manual tagging also lies at the heart of the Semantic Web,
   systems like those of flickr, digg and deli.icio.us are nevertheless
   flawed from its perspective because they involve no unified standard or
   nomenclature for tagging. If, for example, a user tagged an image with
   the word "kids" instead of "children", it will not turn up in the
   search result. On top of that, the tags lack abstraction and
   universality: children for example could be classified as a subset of
   humans, humans as a subset of mammals; birthdays as a subset of
   celebrations etc. With such a classification, pictures marked up with
   "birthday" and "children" could also be found in a more general search
   for pictures of human celebrations. For this reason, unsystematic,
   ad-hoc, user-generated and site-specific tagging systems like those on
   Flickr are referred to as "folksonomies".{4}

   The Semantic Web promises to overcome folksonomies with one, unified
   and standardized keyword tagging system that can applied to anything.
   In other words, it is a universal classificatory description system
   and grand unified hierarchical meta tag tree. In line with computer
   science terminology, but sounding mysterious and idiosyncratic anyone
   else, Berners-Lee calls this classificatory system an "ontology",
   making the project particularly confusing for people with backgrounds
   in philosophy and humanities - because what he and computer science
   call "ontology" is, outside such jargon and in a more common sense
   language, not an ontology, but a cosmology.

   Just as cosmologies are by no means new, so are universal
   classification and tagging systems of all things in the world. In his
   essay and short-story "The Analytical Language of John Wilkins", Jorge
   Luis Borges writes about the English 17th century scholar that

     "He divided the universe in forty categories or classes, these being
     further subdivided into differences, which was then subdivided into
     species. He assigned to each class a monosyllable of two letters; to
     each difference, a consonant; to each species, a vowel. For example:
     de, which means an element; deb, the first of the elements, fire;
     deba, a part of the element fire, a flame." [...]

   Similar classification schemes have been designed throughout the Middle
   Ages and Renaissance among others by Ramon Llull, Giordano Bruno, the
   encyclopedist Johann Heinrich Alsted and the theosophist Jan Amos
   Comenius, scholars in whose tradition Wilkins, a founding member of the
   "Invisible College", works and thinks. Before Diderot's and
   d'Alembert's revolutionary, heretic device of arbitrarily structuring
   human knowledge by the alphabet, encyclopedias has developed
   increasingly complex tree-like classification systems of all things in
   the world they described.{5} The cosmology-called-ontology of the
   Semantic Web is not only similar, but precisely the same.

   Medieval and Renaissance classificatory cosmologies could only work on
   the basis of a stable assumption of what the world is and how it is
   structured: for example, by the four directions, the four seasons, the
   four temperaments, the seven virtues and seven vices, etc. They were,
   in other words, still embedded into the paradigm of Medieval scholastic
   science that in turn had been derived from Aristotle's system of
   categories and its classification of beings into genres and species.
   The Semantic Web is, bluntly said, nothing else but technocratic
   neo-scholasticism based on a naive if not dangerous belief that the
   world can be described according to a single and universally valid
   viewpoint; in other words, a blatant example of cybernetic control
   ideology and engineering blindness to ambiguity and cultural issues.

   Although no Semantic Web existed in the 1940s, Borges' essay hits
   the nail of the issue. One is tempted to replace the name John Wilkins
   with Tim Berners-Lee when Borges reviews the former's categories and
   finds that stones, for example, are absurdly classified as either
   common, or modic, precious, transparent and insoluble, or that beauty
   is assigned to a "living brood fish". He concludes that

     "These ambiguities, redundancies and deficiencies remind us of those
     which doctor Franz Kuhn attributes to a certain Chinese
     encyclopaedia entitled 'Celestial Empire of benevolent Knowledge'.
     In its remote pages it is written that the animals are divided into:
     (a) belonging to the emperor, (b) embalmed, (c) tame, (d) sucking
     pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the
     present classification, (i) frenzied, (j) innumerable, (k) drawn
     with a very fine camelhair brush, (l) et cetera, (m) having just
     broken the water pitcher, (n) that from a long way off look like

   Although this is Borges' own fiction, it nevertheless reveals the
   arbitrariness of categories and classifications. It also had a thorough
   impact as a philosophical critique. Michel Foucault's "The Order of
   Things" begins with a discussion of the above list of animals, which,
   as he admitted elsewhere, "shattered all the familiar landmarks" of his
   thought, opening his eyes on how the order of knowledge is culturally
   constructed and may be conceived differently. To understand Foucault's
   discourse theory, it practically suffices to read Borges' "Ficciones".

   The order of things, and unified classification schemes, do not just
   break down in fiction. Sticking to the example of animals, it is
   obvious how Aristotelian philosophy continues to exist today, in the
   notion of gender and species, and even more questionably in the
   categorization of humans into biological races. But it does not even
   even work in biology itself. The platypus, an Australian animal that is
   a breastfeeding mammal, but it lays eggs, lives in the water and has a
   beak like a bird, famously defies the classifications that historically
   go back to Aristotle's "Zoology". If the platypus breaks genre and
   species classification, where would it fit the Semantic Web?

   In his book "Kant and the Platypus", Umberto Eco points out how the
   animal marks the difference between scholastic and empirical
   science.{6} A bit confusingly, he differentiates "cultural cases" -
   that means categorically defined phenomena - from "empirical cases",
   i.e. phenomena that are observed instead of predefined. "To be
   recognized as such," Eco states, cultural cases "need reference to a
   framework of cultural norms" (Eco 1997, p. 139). For Eco as a
   semiotician, this means that Being, or existence, is the frontier that
   systematic science cannot conquer - and this is what, in a
   philosophical sense, ontology means.

   The innovation of modern science since Galileo, Newton and Descartes is
   that it operates without the reference to those norms. When Diderot and
   d'Alembert abandoned the old classificatory order of knowledge in
   encyclopedias and replaced them with a non-classificatory,
   non-systematic alphabetic order, they precisely followed the empirical
   paradigm, taking phenomena as they occurred and not as they fit. In
   order to be a thoroughly critical investigation and abandon
   preconceptions, science gave up "Semantic Web"-like schemes.

   Returning to Internet folksonomies, a better example than the Platypus
   was brought up in a Web forum of the German computer news site
   heise.de. Discussing the Semantic Web and its classification scheme, an
   anonymous poster brought up the hypothetical example "A Muslim is a
   potential terrorist" in order to show that a unified semantic
   "ontology"/cosmology cannot be built. This example scratches only the
   surface of the pending cultural problems, since not the empirical cases
   like the Platypus, but cultural ones bear the real dynamite. It sheds a
   dubious light on computer linguists involved in the project if they
   don't even seem to have done their homework on Saussure and the
   arbitrariness, i.e. cultural dynamics, of the signifier in relation to
   the signified. The Semantic Web, and any search engine or database
   built upon it, rests on the illusion that an unambiguous assessment of
   the world would be even theoretically possible. Beyond cosmology
   falsely named ontology, it is metaphysics disguised as physics.

   On a more practical (but nonetheless cultural) level, the Semantic Web
   relies on a clean room illusion of a culture where semantic tags
   wouldn't simply be used for spamming and search engine manipulation
   which are already common enough for Google and other search engines to
   ignore meta tags embedded into web pages. And while Berners-Lee is a
   realist enough to state that meta tagging cannot be done by bots like
   those dreamed up by the Theseus project, his Semantic Web implies a
   complexity nightmare of meta information overtaking information, with
   each piece of information creating at least twice as much work for its
   semantic markup than for its creation proper, comparable to a library
   whose the catalogs outnumber the books they reference.

   "Semantics" and "ontology" are useful terms because they reference what
   computers, as purely syntactical machines, cannot process, and which
   can't be mapped into computer data structures except in subjective,
   diverse, culturally controversial and folksonomic ways. The creators of
   the so-called "Semantic Web" and "next-generation" search engines might
   learn from Borges who concludes:

     "I have registered the arbitrarities of Wilkins, [and] of the
     unknown (or false) Chinese encyclopaedia writer [...]; it is clear
     that there is no classification of the Universe not being arbitrary
     and full of conjectures. The reason for this is very simple: we do
     not know what thing the universe is."


   {1} User comment on heise.de: "Ich hab irgendwie den Eindruck dass
   unser Bundesforschungsministerium in der irrigen Annahme ist, das 13
   Millionen Euro eine Software schaffen die es jedem
   Computer-Analphabeten ermöglicht, ganz ohne den `Extra Effort' seine
   `Pisa-Versagen vermarkten und als hochinnovative Rettung des Wissens-
   und Wirtschaftsstandorts Deutschland (wers glaubt ... ),

   {2} Geert Lovink, search engines on the move, 19/9/2007,

   {3} Quoted after: An interview with Tim Berners-Lee,

   {4} "Folksonomy (also known as collaborative tagging , social
   classification, social indexing, social tagging, and other names) is
   the practice and method of collaboratively creating and managing tags
   to annotate and categorize content. In contrast to traditional subject
   indexing, metadata is not only generated by experts but also by
   creators and consumers of the content. Usually, freely chosen keywords
   are used instead of a controlled vocabulary", Wikipedia definition as
   of 18/12/2007, http://en.wikipedia.org/w/index.php?title=Folksonomy

   {5} As a remnant of this tradition, the Diderot/d'Alembert
   encyclopedia still contains such a knowledge tree.

   {6} Eco, Kant and the Platypus, 1997, p. 68


#  distributed via <nettime>: no commercial use without permission
#  <nettime>  is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: http://mail.kein.org/mailman/listinfo/nettime-l
#  archive: http://www.nettime.org contact: nettime@kein.org