The new kind of test pits machine-learning models against humans who do their best to fool them.

The explosive successes of AI in the last decade or so are typically chalked up to lots of data and lots of computing power. But benchmarks also play a crucial role in driving progress—tests that researchers can pit their AI against to see how advanced it is. For example, ImageNet, a public data set of 14 million images, sets a target for image recognition. MNIST did the same for handwriting recognition and GLUE (General Language Understanding Evaluation) for natural-language processing, leading to breakthrough language models like GPT-3.

A fixed target soon gets overtaken. ImageNet is being updated and GLUE has been replaced by SuperGLUE, a set of harder linguistic tasks. Still, sooner or later researchers will report that their AI has reached superhuman levels, outperforming people in this or that challenge. And that’s a problem if we want benchmarks to keep driving progress.

So Facebook is releasing a new kind of test that pits AIs against humans who do their best to trip them up. Called Dynabench, the test will be as hard as people choose to make it.

Benchmarks can be very misleading, says Douwe Kiela at Facebook AI Research, who led the team behind the tool. Focusing too much on benchmarks can mean losing sight of wider goals. The test can become a task.

“You end up with a system that is better at the test than humans are but not better at the overall task,” he says. “It’s very deceiving because it makes it look like we’re much further than we actually are.”

Kiela thinks that’s a particular problem with NLP right now. A language model like GPT-3 appears intelligent because it is so good at mimicking language. But it is hard to say how much these systems actually understand.

Think about trying to measure human intelligence, he says. You can give people IQ tests, but that doesn’t tell you if they really grasp a subject. To do that you need to talk to them, ask questions.

Dynabench does something similar, using people to interrogate AIs. Released online today, it invites people to go to the website and quiz the models behind it. For example, you could give a language model a Wikipedia page and then ask it questions, scoring its answers.

In some ways, the idea is similar to the way people are playing with GPT-3 already, testing its limits, or the way chatbots are evaluated for the Loebner Prize, a contest where bots try to pass as human. But with Dynabench, failures that surface during testing will automatically be fed back into future models, making them better all the time.

For now, Dynabench will focus on language models because they are one of the easiest kinds of AI for humans to interact with. “Everybody speaks a language,” says Kiela. “You don’t need any real knowledge of how to break these models.”

But the approach should work for other types of neural networks too, such as speech or image recognition systems. You’d just need a way for people to upload their own images—or have them draw things—to test it, says Kiela: “The long-term vision for this is to open it up so that anyone can spin up their own model and start collecting their own data.”

“We want to convince the AI community that there’s a better way to measure progress,” he adds. “Hopefully, it will result in faster progress and a better understanding of why machine-learning models still fail.”

Full Story: https://www.technologyreview.com/
Source: https://www.technologyreview.com/2020/09/24/1008882/facebook-ai-test-benchmark-people-break-adversarial/

 

https://intechanalytica.com
Do you like Gemechu Taye's articles? Follow on social!
Comments to: Facebook wants to make AI better by asking people to break it

Your email address will not be published. Required fields are marked *

Attach images - Only PNG, JPG, JPEG and GIF are supported.

Good Reads

Google today revealed Google Maps updates aimed at warning users of pandemic-related threats. Soon, maps will display all-time COVID-19 cases identified in an area, along with fast links from local authorities to resources. Google will also start to demonstrate how bus, train, and subway lines are crowded in more locations across the globe. Maps also […]
Today, $60 million was raised by Hover, a startup creating AI-powered apps that build 3D models of homes from smartphone images. The 200-employee firm says the proceeds will be used as Hover expands its product offerings to strengthen established partnerships with insurance companies.  6.26% of insured homes experienced a claim in 2017, compared to just […]

Worlwide

Google today revealed Google Maps updates aimed at warning users of pandemic-related threats. Soon, maps will display all-time COVID-19 cases identified in an area, along with fast links from local authorities to resources. Google will also start to demonstrate how bus, train, and subway lines are crowded in more locations across the globe. Maps also […]
Today, $60 million was raised by Hover, a startup creating AI-powered apps that build 3D models of homes from smartphone images. The 200-employee firm says the proceeds will be used as Hover expands its product offerings to strengthen established partnerships with insurance companies.  6.26% of insured homes experienced a claim in 2017, compared to just […]
Motional, the joint autonomous driving alliance between Aptiv and Hyundai, announced today that the state of Nevada has obtained permission to test its autonomous vehicles without a driver behind the wheel. The firm claims this is part of the completion of a phase of self-imposed testing and evaluation.  In the U.S., relatively few businesses have […]

Trending

WHEN SARTRE SAID hell is other people, he wasn’t living through 2020. Right now, other people are the only thing between us and species collapse. Not just the people we occasionally encounter behind fugly masks—but the experts and innovators out in the world, leading the way. The 17-year-old hacker building his own coronavirus tracker. The […]
13 September marks six months since the first coronavirus announced in Ethiopia.In the half-year since then, reported cases are close to 64 Thousend, with more than 996 deaths. At the onset, COVID-19 mainly affected the capital city. However, the virus is now moving from high-density urban areas to informal settlements and then onward to rural […]
Present international artificial intelligence (AI) inventory and progression in self-driving vehicle research and development Complementary subjects in technology are also artificial intelligence ( AI) and self-driving vehicles. In brief, without someone involved, you just can’t debate one. While AI has been rapidly applied in different areas, a new hot topic has been the way you […]

Login

Welcome to Intech Analytica

AI news hub. It checks trusted sites and collects best pieces of AI info.
Join Intech Analytica