Photos Spare Cycles MythBusters

Talk: If not Turing's test, then what?

Paul Cohen, USC ISI

This was a nice review of the pros/cons of the Turing Test as well as current grand challenges in AI, with lots of Daniel Dennett quotes.

Note Turing made to self on one of his programs: "How did this happen?"


Premise: Good problems help to produce good science

  • Turing test review
  • alternatives
  • review other "grand challenges"
  • good attributes of grand challenges

Turing's test

Turing's paper in MIND: The question, "Can machines think?" muddied in terms of definition of machine and think. Proposes alternative, the "imitation game."

Three kinds of arguments about Turing's test * irrelevance: 1950-1966: source of inspiration to all in AI. 1966-1973: a distraction from more promising avenues of AI research, 1973-1990: distraction to philosophers, 1990: consigned to history * philosophical: "...a source of distraction mainly to philosophers, rather than AI workers." Dennett: some people have used the test as a launching pad for the sort of definitional debates Turing was trying to squelch. * methodological: good tests of intentional capabilities of machines can lead to more capable machines. What makes them good tests?

Won't go away, serves as grand challenge for the field.

Leave philosophy aside, as speaker doubts that anyone in AI would stop work if the notion that machines could think, philosophically, wasn't an issue.

Good and bad attributes of Turing's test

proxy function * "Nothing could possibly pass he Turing test by winning the imitation game without being able to perform indefinitely many other intelligent actions... [Turing's] test was so severe, he thought, that nothing could pass it fair and square would disappoint us in other quarters." * Proxy for many other tests. * Standing for "more or less" of human intelligence. * Broad competence. * Problem: puts cookie jar on a test so high that no one can reach it.

Test is unspecific * it assesses several aspects of intelligence in a single session, we could have several tests of different aspects of intelligence. * Garder's multiple intelligences theory. * Robert French: "the Test provides a guarantee no of intelligence but of culturally-oriented human intelligence." * Not physical/perceptual/musical/chess/hive/neonatal intelligence

Test cannot be passed today * Make fun of the Loebner prize * Many of AI technologies would not put us closer to passing test. as a vision for AI, it is something we're not working on.

Turing's test is not diagnostic * don't tell us what to do to pass it next time * Whitby: "If the Turing test is read as something like an operational definition of intelligence, then two very important defects of such a test much be considered. First it is all or thing: if it give no..." * Dennett: "Remember, failure on the Turing test does not predict failure on ... others, but success would surely predict success."

The Turing Test is a goal, not a test * a good test provides: (1) diagnosticity: reasons for success and failure; (2) specificity: intelligent behaviors not individuated; (3) a proxy function: passing a test guarantees that a system can perform other tasks in a class. The Turing Test is a proxy for a big class, but humans do much more.

Because of its proxy function, Turing's test is still a great goal * video of two 5-8 year-old kids answering Loebner prize questions * checklist: common sense knowledge, facility with language, learning new facts, inference, problem solving & planning, good manners, sense of humor, metacognitive knowledge, desire to make a point, construct verbal arguments, listen and understand, fill in missing bits, ontology/classification, memory and attention.

New Challenges

  • Handy Andy Report Writing
  • Robot Soccer
  • Cognitive Decathlon - The Virtual Third-grader
  • Learn to read, read to learn
  • Robot Baby

Good tests of intentional capabilities of machines can lead to more capable test, what makes good tests?

Challenge: Handy Andy

Description * Produce a term paper or five-page report on any topic * Use the Web * Produce... not necessarily write (for now)

Graded challenges 1. weak comprehension of the query, the report is collated text 2. stronger comprehension of the query sufficient to excerpt relevant material from relevant web pages 3. Strong comprehension, generate follow up questions for user in a formal language, extract non-redundant material from web pages 4. 3 plus organize material 5. 4 plus English dialog with user, write report de novo that contains no sentence from source Web pages

Good attributes * graded challenges: not all or nothing. * universality: relying on Web as a near-universal resource * Come-as-you-are: don't delay, don't wait for better NL or ontologies or language generation * Comprehension: performance depends on understanding the topic and material on the Web * Ample rope: five pages is ample space to hang oneself

Challenge: Robot Soccer

By the year 2050 develop a team of fully autonomous humanoid robots that can win against the human world soccer championship team

  • clear 50-year goal
  • a technical committee is elected and steers toward the goal via rule changes, new leagues and competitions
  • the first competitions were open to all and intentionally easy
  • simple success criteria
  • no end of good research problems in sight
  • scientific progress is encouraged by awards for Symposium papers
  • transparency: people publish their methods
  • competition motivates students
  • Junior League brings kids into the field (in Lisbon, 200 teams)
  • Public loves it (150K at Japan Open)

Challenge: Cognitive Decathlon or the Virtual Third-Grader

Dave Gunning

  • "Qualifying trials" for Turing's test
  • Individuates cognitive skills
  • Scope of tests can be increased
  • pass the standardized tests administered to third-graders
  • objective scoring
  • common classroom or homework tasks
  • ample rope
  • subjective scoring

Third-grade skills * understand and follow instructions (have conversation ) * learn and exercise procedures (e.g.long division) * Learn by being told (e.g. life was hard for the pioneers. common sense inference that few people wanted to be pioneers) * Understand math story problems and solve them correctly * Prioritize (choose one book over another, decide to do these problems instead of others on a test) * Make a convincing argument (why recess should be longer)

Testing the Virtual Third Grader * creative writing challenge: points for creativity, humor, topicality, worrying the teacher * convincing letter challenge: points for taking a clear position and explaining it, evidence that the other party holds a different position, examples to help illustrate the case, rhetorical skill * learning procedures challenge * California STAR challenge * change of representation challenge: points for representing the same situation in two ways, identifying correspondences between components of the situations

Challenge: Learn to read, read to learn

Most of the world's knowledge is represented in text

By 2020 read and comprehend any book up to third-grade level

Demonstrate that comprehension depends on previously-read texts

Good attributes * comprehension tests are easy to construct * "never-ending," monotonic, "not "throwaway" from one year to the next * graduated series of challenge * Like RoboCup, technical committees for KR, ontologies, parsing, semantic representations, book selection, international, metrics and evaluation * Even a "kid's book Turing test" based on answer to comprehension questions * some serious scientific hypotheses

Scientific claims * Hypothesis (sufficiency of core semantics): we have done enough work in linguistics and ontological engineering to represent the meanings of a useful subset of possible sentences * Hypothesis (bootstrapping). learning by reading interaction with the environment provides sufficient conditions for the machine to extend its core semantics and understand a wider range of linguistic input. Speaker doesn't believe this hypothesis (motivation for Robot Baby challenge) * Hypothesis (nonlinear learning rate)

Challenge: Robot baby

Neonates act and perceive and little else

What is the minimum innate endowment

Good attributes * Reading Hypothesis reworded: learning by reading physical interaction with the environment provides sufficient conditions for the machine to extend its core semantics and understand a wider range of linguistic input * integrates 3 major areas of AI: * sensing perception and action; * learning; * concepts, ontologies, representation, knowledge

Attributes of good test

Group 1: * transparency * frequent tests * 50-year goals * organizations to chart the way

Group 2: * ample rope * test important cognitive functions, particularly comprehension * Automated scoring/continuously available test suites * specificity * diagnosticity * simple success criteria

Group 3: * graduated series of challenges each just slightly out of reach * monotonic, no "throwaways" * come-as-you-are * low cost of admission * a popular problem/competition

Group 4: * developmental approach, not divide and conquer * integrate AI technologies into more-or-less complete agents * In any given challenge, accept poor performance ...

Divide and conquer?

Some challenges and Grand Challenges for Computational Intelligence

Edward Feigenbaum JACM Paper, Feigenbaum Challenge

In each round of the game the behavior of the wtwo players, [Naitional Academny member] and computer is judged by another Academy member in that particular domain of discourse... The judge poses problems, asks questions, asks for explanations, theories, and so on -- as one might do with a a colleague. Can the human judge choose, at better than chance level, which is his National Academy colleague and which is the computer?"

Don't know if it's easier than Turing Test, proposed in spirit of divide and conquer. Partial set of capabilities.

"Divide and Conquer" vs. "Walk before you run"

Growing capability instead of divide and conquer.

A developmental strategy is to build more-or-less complete agents, each with several cognitive functions. At first they aren't very capable. By setting new problems we gradually make them more capable.

Make the problems more challenging over time to grow capabilities.

"Perhaps 'divide and conquer' is not an inevitable strategy and we can do better than build partial intelligences."

For example, in RoboCup, robots are complete agents, just not very good at it.

related entries.

what is this?

This page contains a single entry from kwc blog posted on July 28, 2004 2:12 PM.

The previous post was Talk: Eliciting Bid Taker Non-price Preferences in (Combinatorial) Auctions.

The next post is Talk: Affective Recruitment of Distributed Heterogeneous Agents.

Current entries can be found on the main page.