This quarter, I have been able to
study existing technologies under several new lenses. In a communications
class, we explored the Turing Test and what this signified for artificial
intelligence. With the knowledge I’ve accumulated in our linguistics class, I
am able to criticize the Turing Test as being misguided and suggest solutions. To delve further into the work done in my communications class, let me first start by describing the Turing Test and the problems I have
diagnosed. Then, I will look at linguistic solutions to approaching the
problems highlighted.
In the 19th and 20th centuries in the US, behaviorism - led by scientists Watson and Skinner - was the main approach to psychology. A behaviorist believes that if you respond in a way appropriate to someone with that belief, then you have
it.
For example, if being happy (the belief) was defined as smiling, then a smiling robot would
be ‘happy'. In this way, behaviorism bypasses any reference to mental events.
In this post, I will be discussing intelligence - and so it is important to think about a possible definition of the term. Turing’s definition is strongly behaviorist: if a
machine behaves as intelligently as a human, then it is as intelligent. This view of artificial intelligence is behaviorist, by suggesting that that it does not matter if a machine can 'think' – it must simply act like it is thinking. Stemming from this
line of thought, Alan Turing designed “The Turing Test,” which I will describe
very shortly. This is a behavioral test for the presence of mind, thought or
intelligence through the framework of an imitation game. In broad terms, a
human must determine whether the entity it is communicating with through a
computer is a human or a computer by asking the wannabe-human questions.[1]
With this understanding, let us criticize the test.
It makes me uncomfortable to think that a machine with an extremely basic mind (inexistent, one might argue) could pass the Turing Test and be deemed intelligent. Because
the Turing Test recognizes intelligence in programs that sustain a human-like
conversation, programs try to trick humans. For example, a tester could ask a
difficult arithmetic question such as “Calculate 2231*347” and the computer
program would have a delayed answer to mimic human computational behavior.
This, to me, shows us that the Turing Test has lost its way. Here, we are
testing for intelligence from a human standard and not exploring what it really
means to be intelligent. Imagine a robot that had an output for every possible input, and that this robot looked like a human - it even behaved like a human too. Is it intelligent or isn't it just a simulation? This discussion suggests that we must
try and determine how the program’s internal structure – its mind – behaves and
see if that matches with our existing notions of intelligence. The problem though: how can
we do that without looking at its internal states? Like Turing, we have to settle with
judging behavior because we cannot judge its internal structure.
Instead of Turing’s imitation game, I think there should be a series of questions (constantly refreshed so robots cannot cheat by learning new output) that test a program’s ability to reason. It turns out that certain linguistic tools can test reason very well. Programs developed for the Turing test perform incredibly well across many areas, especially syntax. However, certain semantic problems that require extra linguistic context cause these programs difficulty. Consider the example question:
Instead of Turing’s imitation game, I think there should be a series of questions (constantly refreshed so robots cannot cheat by learning new output) that test a program’s ability to reason. It turns out that certain linguistic tools can test reason very well. Programs developed for the Turing test perform incredibly well across many areas, especially syntax. However, certain semantic problems that require extra linguistic context cause these programs difficulty. Consider the example question:
The large
ball crashed right through the table because it was made of Styrofoam. What was
made of Styrofoam?[2]
a.
The large
ball
b.
The table
The following question is an example of anaphora, where the
interpretation depends upon another expression in context. In order to answer
the above question, you need to have some domain-specific knowledge about
materials and how things break. Humans have no trouble solving these questions
because they have knowledge stored away. However, computers find these
questions very difficult because they do not know where to retrieve the
information – or even determine what information needs to be retrieved. Let us
consider another example:
a. Only a few of the children ate their ice-cream. They ate the strawberry flavor first.
b. Only a few of the children ate their ice-cream. They threw it around the room instead.
a. Only a few of the children ate their ice-cream. They ate the strawberry flavor first.
b. Only a few of the children ate their ice-cream. They threw it around the room instead.
Here, anaphora refers to what is called a complement set. In (a), the pronoun ‘they’ refers to the children eating ice cream, whereas in (b) it refers to the non-ice cream eaters. Using common sense, we are able to quickly decipher who the pronoun refers to. However, this is once again a difficult process for a computer because there are multiple possible referents. Crucially, to solve these questions, the computer must have greater understanding about the state of the world.
By designing questions like these that test a computer’s internal structure, we are able to better determine whether a machine is intelligent. Such linguistic tricks, then, should be used in the Turing Test. However, are such questions sufficient to diagnose intelligence? Is this kind of semantic knowledge synonymous with understanding? Is this version of the test still too close to a human standard of intelligence? Overall, I hope that I have shown that Turing test questions should try and peer into the internal structure of a program by asking semantically challenging questions and observing its output.
[1] Oppy, Graham. "The
Turing Test." Stanford University. Stanford University, 09 Apr.
2003. Web. 25 Nov. 2014.
<http://plato.stanford.edu/entries/turing-test/>.
[2] Lohr, Steve.
"Looking to the Future of Data Science." Bits Looking to the
Future of Data Science Comments. New York Times, n.d. Web. 10 Nov. 2014. <http://bits.blogs.nytimes.com/2014/08/27/looking-to-the-future-of-data-science/?_r=0>.
This is in an incredible post, addressing important questions in artificial intelligence, computer science, philosophy, and linguistics. Even though we discussed your thoughts in section, I thought it would be beneficial to mention a few things we discussed. One major topic we discussed was John Searle’s Chinese Room thought experiment. His thought experiment asks us to imagine a native English speaker who knows no Chinese in a “black box” with a Chinese translation dictionary and all the necessary tools to properly translate to Mandarin. (Imagine that people outside the room send in other Chinese symbols, which, unknown to the person in the room, are questions in Chinese (the input). And imagine that by following the instructions in the translation tools the man in the room is able to pass out Chinese symbols, which are correct answers to the questions (the output). The program enables the person in the room to pass the Turing Test for understanding Chinese even though he does not understand a word of Chinese. He argues that a computer can do the same thing, so a computer has anything the man possesses when it comes to language.
ReplyDelete