On the general topic of Babelfish, cultural bias, etc...

Had a project on from about '98 to '00
called "Debabelizer." If you know the image-translation
software called "Debabelizer," you know that its intent is
to straighten out the Tower of Babel of image formats so that
any image could appear exactly the same in any language. So, 
the Debabelizer purported to do the same for 
language on the web - 
i.e. translate web pages into homogenized, Universal Web Language,
free of Cultural Confusion... since, when you get right down to
it, this seems to be somewhat the underlying assumption of translation software.
(Though in the fine print they'll admit that's problematic, the
big print is what the marketing folks write, obviously... more
on this later...)
The plagiarist Debabelizer took a web page, as chosen
by the visitor, fed it into Babelfish to translate it from
its native language into another, and then again to
Babelfish to translate back into its native

A canned sample can be seen at:

The original site, where you can see the setup, is:
(However, that one, at least for now, *will not debabelize*... 
Babelfish changed their engine, and I haven't gotten it to
work again with webpages so far. Of course, the same thing can be
done manually...)
There is also a similar debabelization technique in use for the
Plagiarist Guestbook
"(As a service to our international community of visitors, comments will be 
automatically "debabelized" via Babelfish Translation Services.)"
So, what have I learned about cultural bias of Babelfish from all this?
I imagine that Babelfish suffers from the
same cultural bias as whatever dictionaries it's using - at
least when going to English. I suspect it's using fairly
generic translation dictionaries...  The reason
I say this is, as far as I can tell, Babelfish seems to select 
Standard Dictionary Definition #1 as the meaning of a word whenever in doubt. 
Among other things, this sometimes makes for some flamboyant translations, 
because Definition #1 of  verbs seems to tend to be more active/dramatic than 
other meanings. This is all very unscientific of course on my part.

Also, Babelfish gets itself into trouble with adjectives fairly often, and
quickly we discover the thin line between "close enough" and Politically
Incorrect. My favorite Debabelizer gaffe: a New York Times front page
featured a blurb about the "Yankees' colorful players." Debabelizer 
translated that into "the Yankees' colored players." I suspect the problem
here is that the grammar algorithms are biased toward past participles.
Whether or not that counts as a cultural bias I'm not sure, but it certainly
does get Babelfish into some cultural hot water in such cases...

Also, different language pairs generate different translations,
obviously. German and Portugese
tend to generate the most amusing translations when translated from
English -> Other Language -> English, with
German being the most consistently funny. I'm sure there's an interesting
linguistic reason for this... I'm not a linguist, but perhaps it
has to do with the similarity in many word definitions between the
two languages, combined with a dissimilarity in the grammars. Just
a guess.

Originally, I thought of the premises upon which I'd based Debabelizer
as being humorous exaggerations - i.e., nobody could *really* take these 
terrible translators at all seriously for actual communication - at least not
more than an extremely rough translation of web pages. Even the
marketing execs wouldn't have the chutzpah to purport that something
as culturally subtle and complex as conversational language could/should
be entrusted to computer programs, I thought. However, 
some recent articles in various mainstream press - Wired, etc., 
have made me think twice about this assumption. 

I have the following clipping from a newspaper interview taped
to my refrigerator - with the interviewee's name stupidly clipped out
for some reason.
But anyway, he was some sort of Executive of New Technologies, for some big
company which I think might have been IBM. Anyway, here's the

Q: Can you give an example of the Internet becoming a more
"natural" experience?
A: I am excited about language translations. [Today, instant online
translation] is not good enough for contracts but it is
good enough for conversation. It is good enough for customer
service and support. So, for example, a Spanish-speaking person can ask a
question of customer service and a Chinese person can answer it in
Chinese and the Spanish person hears it in Spanish.
[As such services become broadly available, they will]
make the Internet a lot more natural for large numbers of people.

Yeah, I can hardly wait to call Macintosh tech support to say my
Apple keeps bombing, and have the FBI come arrest me for threatening
to blow up New York....


