In the Pixar movie Wall-E, our eponymous hero meets the love of his life, a sleek robot named Eve, after she spots his houseplant and steals it to help save the human race. If the film were set in 2015, Eve probably would have grabbed the bubble wrap by mistake.
Computers are a long way from being capable of the flawless image recognition you see in the movies. To illustrate that point, we put four of today's most cutting-edge commercial artificial-intelligence systems through their paces to see how well they could identify what's in a photograph. We fed six images into each system from AI startups Clarifai, Orbeus, and MetaMind, as well as IBM's celebrity Jeopardy! contestant, Watson, and recorded the classifications.
Within the past half-decade, AI research and development has been supercharged, thanks partly to academics at Stanford University, New York University, and the University of Toronto, and researchers at Google, IBM, and various startups. They've accomplished things in computer vision that were unimaginable years ago, but the results of our computer eye exam show that, although machines are getting very good at some things, they still come up with strange or nonsensical answers every now and again.
Where these systems fail tells us a lot about why computers won't be replacing us for general image recognition tasks anytime soon. One failure comes from data—if a system has never encountered a type of picture before, it may give a perplexing answer. The other failure comes from focus—show a picture of a hamster in a wheel in a cage to a person and ask them what it shows and they'll probably first tell you a hamster. A computer might think the wheel is the most important thing. For visual recognition systems to get better, they're going to have to deal with these types of ambiguous situations, and that will require further development of fundamental artificial-intelligence systems.
Take a look for yourself. (Note: Some of these companies have AI systems for specific categories, like food. When available, we tested both and separated the general answer from the specific one with a slash.)
Best answer: Fat (Clarifai)
Worst answer: Slug/Churros (MetaMind, demo version)
Best answer: Cloth/Zuckerberg (Orbeus)
Worst answer: Cardigan (MetaMind, demo version)
Best answer: Cat—everyone got this. After all, the Internet is made of cats.
Best answer: Mountain (Orbeus)
Worst answer: Vehicle/Scene (IBM Watson Visual Recognition, beta version)
Best answer: Technology (Clarifai)
Worst answer: Photo/Object (IBM Watson Visual Recognition, beta version)
Best answer: Missile (MetaMind, demo version)
Worst answer: Blue/Plane (Orbeus)