Many real-world concerns involve complex teamwork, e.g. individuals or algorithms, between multiple agents. In this respect, a machine learning technique called multi-agent reinforcement learning (MARL) has demonstrated success, primarily in two-team games such as Go, DOTA 2, StarCraft, hide-and-seek, and flag capture. But there’s a much messier human world than sports. That’s because people face numerous social dilemmas, from interpersonal to foreign, and they need to determine not only how to cooperate, but when to cooperate.

Researchers at OpenAI suggest training AI agents with what they term randomized unpredictable social preferences (RUSP) to overcome this problem, an increase that extends the distribution of environments trained by reinforcement learning agents. Agents share different amounts of reward with each other during training; however each agent has an individual degree of ambiguity about their relationships, causing “asymmetry” that researchers believe pressures agents to learn socially reactive behaviors.

To illustrate the potential of RUSP, the co-authors had agents play Prisoner’s Buddy, a grid-based game in which agents earn a reward for finding a buddy.” Agents act on each phase of the time either by choosing another agent or choosing no one and sitting out the round. If two agents choose each other, they each receive a +2 reward. Alice receives -2 and Bob receives +1 if Agent Alice prefers Bob, but the preference is not reciprocated. Agents who want no one will earn 0.0

In a far more complex setting called Oasis, the coauthors have discussed preliminary team dynamics. It is physics-based and assigns survival agents; their reward is +1 for each time they stay alive and when they die, a significant negative reward. With each move, their health declines, but by eating food pellets, they can regain health and can attack others to minimize their health. When an agent is lowered below 0 health, after 100 timesteps, it dies and respawns at the edge of the play area. 

In Oasis, there is only ample food to sustain two of the three agents, producing a social dilemma. To protect the food source to remain alive, agents must break symmetry and gang up on the third.

The researchers report that RUSP agents in Oasis performed much better than a “selfish” baseline in that they achieved higher rewards and died less frequently. (Up to 90% of the deaths in an episode were due to a single agent for agents equipped with high levels of instability, meaning that two agents learned to form a partnership and mostly exclude the third from the food source.) And in Prisoner’s Buddy, RUSP agents effectively divide into teams that have proven to be cohesive and sustained throughout an episode.

The researchers note that RUSP is inefficient; 1,000 iterations corresponded to approximately 3.8 million episodes of experience with the training setup in Oasis. This being the case, RUSP and techniques like it warrant more exploration, they claim. In a paper submitted to the 2020 NeurIPS conference, they wrote, ‘Reciprocity and team forming are hallmark behaviors of sustained cooperation in both animals and humans.’ “The roots of all of our social systems are rooted in these essential practices and are often clearly written into them-reciprocal punishment was at the heart of the code of laws of Hammurabi almost 4,000 years ago. It seems a wise first step to understanding how simple types of reciprocity will evolve in artificial agents, if we are to see the emergence of more complex social structures and norms.

Comments to: OpenAI proposes using reciprocity to encourage AI agents to work together

Your email address will not be published. Required fields are marked *

Attach images - Only PNG, JPG, JPEG and GIF are supported.

Good Reads

Google today revealed Google Maps updates aimed at warning users of pandemic-related threats. Soon, maps will display all-time COVID-19 cases identified in an area, along with fast links from local authorities to resources. Google will also start to demonstrate how bus, train, and subway lines are crowded in more locations across the globe. Maps also […]
Today, $60 million was raised by Hover, a startup creating AI-powered apps that build 3D models of homes from smartphone images. The 200-employee firm says the proceeds will be used as Hover expands its product offerings to strengthen established partnerships with insurance companies.  6.26% of insured homes experienced a claim in 2017, compared to just […]

Worlwide

Google today revealed Google Maps updates aimed at warning users of pandemic-related threats. Soon, maps will display all-time COVID-19 cases identified in an area, along with fast links from local authorities to resources. Google will also start to demonstrate how bus, train, and subway lines are crowded in more locations across the globe. Maps also […]
Today, $60 million was raised by Hover, a startup creating AI-powered apps that build 3D models of homes from smartphone images. The 200-employee firm says the proceeds will be used as Hover expands its product offerings to strengthen established partnerships with insurance companies.  6.26% of insured homes experienced a claim in 2017, compared to just […]
Motional, the joint autonomous driving alliance between Aptiv and Hyundai, announced today that the state of Nevada has obtained permission to test its autonomous vehicles without a driver behind the wheel. The firm claims this is part of the completion of a phase of self-imposed testing and evaluation.  In the U.S., relatively few businesses have […]

Trending

WHEN SARTRE SAID hell is other people, he wasn’t living through 2020. Right now, other people are the only thing between us and species collapse. Not just the people we occasionally encounter behind fugly masks—but the experts and innovators out in the world, leading the way. The 17-year-old hacker building his own coronavirus tracker. The […]
13 September marks six months since the first coronavirus announced in Ethiopia.In the half-year since then, reported cases are close to 64 Thousend, with more than 996 deaths. At the onset, COVID-19 mainly affected the capital city. However, the virus is now moving from high-density urban areas to informal settlements and then onward to rural […]
Present international artificial intelligence (AI) inventory and progression in self-driving vehicle research and development Complementary subjects in technology are also artificial intelligence ( AI) and self-driving vehicles. In brief, without someone involved, you just can’t debate one. While AI has been rapidly applied in different areas, a new hot topic has been the way you […]

Login

Welcome to Intech Analytica

AI news hub. It checks trusted sites and collects best pieces of AI info.
Join Intech Analytica