Multi-agent reinforcement learning investigation of the mechanistic role of empathy on pro-environmental behavior

Evann Rabeau^1, @, Jérôme Iste, Ismael Freire, @

1 : ISIR
Mehdi Khamassi

The ecological crisis raises questions about determinants of common resource depletion, commonly conceptualized as the Tragedy of the Commons (Hardin, 1968). This dilemma can be solved by governance, however its acceptability and application depends on group and individual level factors (Ostrom et al., 2009). At the individual level, empathy has been identified as a crucial factor, but its causal mechanisms remain difficult to test empirically at scale (Lynne et al. 2016).

We address this gap with a multi-agent reinforcement learning study of the influence of the components of empathy (cognitive, affective, motivational) on common-pool resource dilemmas. Prior work showed that modifications of agents' reward structures or observations can induce cooperation, conventions, and sustainability (Perolat et al., 2017; Wang et al., 2019; Zhu & Kirley, 2019). We designed empathy as a modulator of agents' learning, as the ability to (1) observe emotional signals derived from others' consumption outcomes (cognitive and affective empathy, Bošnjaković & Radionov, 2018) and (2) integrate others' well-being into their reward function (empathic concern or motivational component, Decety, 2015).

We expect (TH1) the conjunction of the ability to observe and be motivated by others' emotions to reduce overconsumption. Secondly, we expect (TH2) empathy to contribute to sustainability through the acceptability of governance strategies (DeCaro et al., 2021; Ostrom, 2009), by fostering fairness in resource consumption (Moreno-Casas & Bagus, 2021). We designed an environment with a single renewable resource pool shared among agents. At each time step, agents decide to perform one of two actions, one of which can yield access to a unit of resource, depending on the remaining amount of resources. Following the properties of common-pool resource dilemmas, agents' use of the resource reduces its availability to others, resource levels evolve dynamically through a regeneration process (Ostrom et al., 1999), and environmental parameters are chosen such that agents are likely to deplete resources. Episodes end when the resources are fully depleted or when a fixed number of steps is reached. At the beginning of each episode, the resource stock is reset, while agents retain their learned policies.
Agents use Deep Q-Networks (Gronauer & Diepold, 2022; Mnih et al. 2015) to learn action-selection policies by associating environmental states with long-term expected rewards. They follow an ε-greedy exploration strategy, initially favoring exploration and progressively shifting toward exploitation. They also share identical network architectures and hyperparameters. We compared two experimental conditions: (1) non-empathic agents, whose rewards depend exclusively on their own consumption outcomes and cannot observe others' emotions, and (2) empathetic agents, whose rewards depend on the average of others' emotional signals, which are based on the recent consumption rate.

Results show that empathetic agents adopt significantly different behavioral outcomes compared to non-empathetic agents across all evaluated metrics. Empathetic agents showed significantly lower averaged resource utility than non-empathetic agents (Student's t(58) = 143.47, p < .001, d = 37.04), indicating reduced overconsumption. Resource depletion was significantly reduced in the empathetic condition, both in terms of the proportion of remaining resources at the end of episodes (Welch's t(58) = 51.31, p < .001, d = 13.25) and the number of steps before depletion (Welch's t(58) = 139.68, p < .001, d = 36.07). However, contrary to theoretical expectations, equality in resource distribution was lower in the empathetic condition, as reflected by a higher Gini coefficient (Welch's t(58) = −113.18, p < .001, d = −29.22).

In line with TH1, empathetic agents reached overconsumption later in each episode; empathy appears to reduce the risk of overconsumption. Observing others' emotions, linked to others' utility, may guide behavior and facilitate the learning of adapted consumption of goods.
Contrary to TH2, there is lower fairness in the empathic condition. However, it aligns with results indicating that empathic concern reduces the perception of unfairness (He et al., 2022), thereby making it acceptable for the population.

These results come with several limitations. We selected simulation parameters and numbers through brute-force exploration. We can also highlight threats to the transferability of our results to humans. Our current method for computing emotions uses a linear mapping (Gannouni et al., 2020; Joffily & Coricelli, 2013). Alternatives, such as a sigmoid transformation or bio-inspired models of emotions, could increase the predictive power but yield similar effects. Furthermore, previous research indicates that empathetic valuation and behavioral investment related to others' needs decay with time (Coleman & DeSteno, 2024). Here, agents weigh future rewards nearly as heavily as immediate rewards. Testing stronger decays could improve validity by mimicking human temporal preferences. Finally, to control for confounding factors, we shall compare the simulations with random emotional signals and random social rewards.

To conclude, this work highlights the value of computational AI-based modeling approaches for investigating cognitive abilities. While our results do not disentangle the mechanistic role of empathy, future simulations without observations of others' states, and others' without social rewards but observations, would allow us to identify the key components for psychosocial interventions relying on empathy.

Type : : Poster

Thématiques : Autre