Douglas Bagnall on Sun, 31 Dec 2017 00:08:12 +0100 (CET)


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: <nettime> Deep Fool


On 29/12/17 20:05, Morlock Elloi wrote: 
> Longer version and remarks: current ML systems appear to be linear,
> so it's possible to synthesize diversions even without knowing how a
> particular system works. Systems can be fooled to miscatergorize
> visual images (turtle gets recognized as a rifle in live video, face
> is not classified as face in still photo, etc.), and autonomous
> vehicles can be made to misread signals and signs (apparently most
> research goes there.)
 
It should be no surprise that machine learning systems take shortcuts,
because those shortcuts are THE WHOLE POINT. You don't want to make
the machine learn more stuff -- you want it to learn LESS stuff that
it doesn't need, to forget and generalise. But only, of course, to the
extent that it improves its score in whatever metric you're chasing.

If you wanted your computer vision paper to be taken seriously between
say 2012 and 2016, your metric was an ImageNet task. Your system would
have learnt to consign photographic images into one of a thousand
categories based on objects depicted in them. The categories are
wilfully ad-hoc -- including 120 breeds of dog, 5 species of turtle,
"rifle", "triceratops", but not (for example) "book" or "potato" or
"cloud". It is not supposed to be a USEFUL task, it is supposed to be
a CHALLENGE. People spent millions to do well in ImageNet
competitions, and the results *are* quite impressive. It's finished,
pretty much, and now (as Morlock says) the attention is turning to
road signs.

Anyway, because ImageNet models are instantly useless as soon as they
have made their splash, they are usually made public and dumped on
github. The research into adversarial imagery works with this
detritus, which completely makes sense --as detritus goes, it is
state-of-the-art, freely available, and famous-- and I love the genre
and the results are great. But keep in mind what is being attacked.
These are machines that devote ONE EIGHTH of their minds to
distinguishing BREEDS OF DOGS; to them mistaking an English setter for
an Irish is as serious as seeing a turtle as a rifle.

Just by looking at the Athalye et. al. turtle, you can see that the
system associates rifles with the texture of polished wood. And indeed
when you look in the ImageNet rifle category you see a lot of wood,
not only in the guns themselves, but in mounts of various sorts. The
rifle category reveals how people *photograph* rifles. We can conclude
that the system concluded that "rifle" was the most polished-wood-ish
category. This is the kind of excellent short-cut you *want* your
ImageNet entry to make, allowing it to devote more attention to
terriers.

So where does that leave us with regard to fooling real-world systems?
To make an actual rifle vs turtle (or more realistically rifle vs
non-rifle) discriminator, you would want perhaps a million pictures of
rifles in the kinds of scenes you care about, and just as many
pictures of the other thing. Most likely you'd use video and allow
your system to build an opinion over several frames. Labelling the
images would take a lot of human labour and you'd need a lot of money.
But you'd end up with a model that works really well (better than
human observers) and not be fooled by turtles. You wouldn't drop your
data or your model on the internet, so nobody could test your system
offline. It is as easy to monitor a thousand video streams as one, so
you could potentially lay off quite a few people and recoup you
up-front costs quite quickly -- supposing a narrow focus on rifles is
as relevant to your business as you thought it was. If not, well,
you've already laid off your surveillance staff, so you extend your
dataset with the new target and keep going.

Machine learning DOES favour people with big secret databases. Fooling
the system is NOT necessarily easier than fooling a person, though its
power is not in being always right, rather in being unblinking,
ubiquitous, normalising, and cheap at scale.

Douglas
#  distributed via <nettime>: no commercial use without permission
#  <nettime>  is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info: http://mx.kein.org/mailman/listinfo/nettime-l
#  archive: http://www.nettime.org contact: nettime@kein.org
#  @nettime_bot tweets mail w/ sender unless #ANON is in Subject: