Volume 59 Issue 04 May 2026
Research

Stackelberg Mean Field Games: A Framework for Policy Design in Complex Systems

Designing the “best” governmental policies is a highly challenging task, as policymakers must account for how individuals or institutions can respond to policy interventions. In aspects ranging from public health to finance, creating policies that remain optimal despite the response from the public presents a difficult problem. For example, the availability of a vaccine can impact how and when people socialize, which in turn can influence the spread of an infection. Similarly, a new capital requirement for banks may alter how banks manage risk and interact with their peers, while the implementation of a carbon tax may result in transitions to renewable energy, thus impacting supply variances and affecting electricity prices. In these cases, the agents (i.e., the banks, individuals, energy market participants, or other decision-makers) adapt to the policy environment. These emergent behaviors pose difficult challenges for mathematical modelers and policymakers and may trigger outcomes that differ from the policy’s intended effects.

For this reason, establishing optimal policies has become a central interest for operations researchers, economists, and applied mathematicians. On the theoretical side, standard optimal control techniques typically assume that a policymaker steers a dynamical system whose evolution is fixed once a control is chosen. However, in many social, economic, and engineering settings, these dynamics are not static; they depend on how agents interact with one another and react to the implemented policies. On the computational side, agent-based modeling and simulation (ABM) can help policymakers visualize how agents might respond to policies according to heuristic rules. While ABM is invaluable for comparing different scenarios, it often faces a significant tradeoff between realism and tractability. Complex ABM can be difficult to calibrate, analyze, and optimize. Furthermore, because behavior in ABM is prescribed by fixed rules, it can be challenging to attribute specific causes to outcomes or to quantify exactly how a policy influences behavior.

To identify effective policies that anticipate emergent population behaviors in a tractable manner, we can employ the Stackelberg equilibrium, a fundamental concept in mathematical game theory. In a classical Stackelberg game, a leader initiates a strategy by setting an incentive, policy, or constraint. The follower then responds with their best strategy, namely, the behavior that optimizes their own objectives given the leader’s incentive. Finally, the leader optimizes their own objective by accounting for the follower’s optimal responses as a direct function of the initial incentive. This setting drew particular interest in contract theory literature, studied in a continuous-time framework via backward stochastic differential equations [12].

When there is more than one follower, we must also account for interactions among the followers. In this setting, we can assume that the followers respond to the leader’s policy by playing a Nash equilibrium, where each follower chooses their best response (i.e., a strategy that optimizes their own objectives) given both the leader’s policy and the behavior of the other followers. This creates a situation where, conditional on the leader’s policy, no follower has an incentive to deviate from their strategy. In large populations, solving for a Nash equilibrium is challenging due to the increasing number of interactions, thus approximation methods such as mean field games (MFGs) can be employed. The full framework involving a leader and a large population of followers is typically referred to as a Stackelberg MFG [4, 8, 11]. 

Informally, the leader’s problem is to choose a policy that optimizes their own objective, subject to the constraint that followers will find the equilibrium behavior for themselves. This “subject to” constraint is the crucial element of the framework; it transforms a standard, single-level optimization problem into a nested one, where the leader’s decision must explicitly account for the internal optimization process of the followers. This hierarchy provides a robust abstraction for many policymaking scenarios, ensuring that the resulting interventions are resilient to the strategic adaptations of the population.

<strong>Figure 1.</strong> Disease spread under Stackelberg mean field game (MFG) policies. <strong>1a.</strong> Comparison of the spread of disease under MFG policies as opposed to a free spread scenario. <strong>1b.</strong> Stackelberg MFG social distancing policies for susceptible individuals (blue, dashed) and infected individuals (orange, dashed), as well as Nash equilibrium social distancing response of susceptible individuals (blue, solid) and infected individuals (red, solid). Infected individuals are given stricter distancing guidelines than susceptible individuals, and susceptible individuals restrict socialization below the guidelines to protect themselves further. Figure adapted from [2].
Figure 1. Disease spread under Stackelberg mean field game (MFG) policies. 1a. Comparison of the spread of disease under MFG policies as opposed to a free spread scenario. 1b. Stackelberg MFG social distancing policies for susceptible individuals (blue, dashed) and infected individuals (orange, dashed), as well as Nash equilibrium social distancing response of susceptible individuals (blue, solid) and infected individuals (red, solid). Infected individuals are given stricter distancing guidelines than susceptible individuals, and susceptible individuals restrict socialization below the guidelines to protect themselves further. Figure adapted from [2].

One prominent application of this framework is in the mitigation of epidemics [2]. In an MFG approach to disease control, the population is modeled as a large collection of agents whose health states (e.g., susceptible, exposed, infected, or recovered) change over time. Each individual manages their transition between health states; for instance, by controlling the intensity of their social interactions, an individual can minimize their personal risk or cost. A public health authority, as the leader, would influence this system through non-pharmaceutical interventions or incentives, aiming to steer collective behavior toward a socially optimal outcome — such as reducing the infection peak or preserving healthcare capacity. In this case, individuals (followers) naturally adjust their socialization habits in response to the spread of disease and the government’s mandates, thus the optimal mitigation policies can be formulated as a Stackelberg MFG problem. Figure 1 presents a comparison of the spread of a disease under Stackelberg MFG policies to the free epidemic spread case — where the individuals do not adjust their socialization levels and the public health authority do not give any distancing guidelines. We can see that Stackelberg MFG policies give stricter guidelines to infected individuals which results in a decrease in the spread of the disease.

Another compelling application lies in systemic financial risk management and the stability of banking networks [7, 9]. In this setting, a central bank acts as the leader by announcing macroprudential policies (e.g., specific borrowing and lending rates or liquidity requirements). Related financial institutions then act as followers, adjusting their interbank lending strategies and risk-taking behaviors to maximize their individual profits, ultimately reaching a Nash equilibrium. The central bank’s objective is to choose a policy that prevents a cascade of failures and ensures that the number of defaults remains below a critical threshold. Since the other banks’ reactions to interest rates or capital buffers can shift the stability of the entire market, the central bank must solve a Stackelberg MFG to identify an intervention that is robust to the strategic shifts of the banking sector. This approach allows regulators to move beyond static stress tests and instead model the dynamic, reactive nature of the global financial system.

One other real-life-inspired application of the Stackelberg MFG is the modeling of electricity producers and their response to carbon reduction policies such as taxation [6]. The government, as the leader, selects carbon tax levels that optimize their own objectives, such as keeping maximum carbon emission levels at a specific level while ensuring the electricity demand is satisfied over the time horizon. The electricity producers in turn act as followers and decide on their nonrenewable and renewable energy resource investments to maximize their own objectives, such as maximizing their revenue while minimizing production-related costs. The producers interact with each other through the pricing of the electricity which is determined according to average supply and demand levels. Energy-market-related applications have been studied with MFGs and their extensions by many researchers [1, 3, 10].

New computational paths for solving these complex nested problems have emerged in recent years, moving beyond the limitations of traditional numerical methods. One generic approach involves reformulating the bilevel Stackelberg problem into a single-level mean field optimal control problem [9], which is then solved using a deep learning method. Another approach proposes a bilevel method based on a finite-dimensional approximation of the principal’s decision space and a deep learning method to solve a forward-backward stochastic differential equation and fit the principal’s loss function. This algorithm is then applied to a model for Renewable Energy Certificate markets [5]. 

Stackelberg MFGs offer a rigorous mathematical foundation for understanding the interplay between top-down incentives and bottom-up strategic behavior. By treating the population’s response using game theoretical ideas, we can design interventions that are more effective and robust.

References 
[1] Aïd, R., Basei, M., & Pham, H. (2020). A McKean–Vlasov approach to distributed electricity generation development. Math. Method. Oper. Res., 91(2), 269-310.
[2] Aurell, A., Carmona, R., Dayanıklı, G., & Laurière, M. (2022). Optimal Incentives to Mitigate Epidemics: A Stackelberg mean field game approach. SIAM J. Control Optim., 60(2), S294-S322.
[3] Bassière, A., Dumitrescu, R., & Tankov, P. (2024). A mean-field game model of electricity market dynamics. In Quantitative Energy Finance: Recent Trends and Developments (pp. 181-219). Cham, Switzerland: Springer Nature.
[4] Bensoussan, A., Chau, M.H., & Yam, S.C.P. (2015). Mean field Stackelberg games: Aggregation of delayed instructions. SIAM J. Control Optim., 53(4), 2237-2266.
[5] Campbell, S., Chen, Y., Shrivats, A., & Jaimungal, S. (2021). Deep learning for principal-agent mean field games. Preprint, arXiv:2110.01127.
[6] Carmona, R., Dayanıklı, G., & Laurière, M. (2022). Mean field models to regulate carbon emissions in electricity production. Dyn. Game Appl., 12(3), 897-928.
[7] Carmona, R., Fouque, J.P., & Sun, L.H. (2015). Mean Field Games and systemic risk. Commun. Math. Sci., 13(4), 911-933.
[8] Carmona, R., & Wang, P. (2021). Finite-state contract theory with a principal and a field of agents. Manag. Sci., 67(8), 4725-4741.
[9] Dayanıklı, G., & Laurière, M. (2025). A machine learning method for Stackelberg mean field games. Math. Oper. Res., 50(4), 3055-3093.
[10] Elie, R., Hubert, E., Mastrolia, T., & Possamaï, D. (2021). Mean–field moral hazard for optimal energy demand response management. J. Math. Finance., 31(1), 399-473.
[11] Elie, R., Mastrolia, T., & Possamaï, D. (2019). A tale of a principal and many, many agents. Math. Oper. Res., 44(2), 440-467.
[12] Sannikov, Y. (2008). A continuous-time version of the principal-agent problem. REStud, 75(3), 957-984.

About the Authors

Mathieu Laurière

Assistant professor, New York University

Mathieu Laurière is an assistant professor at New York University (NYU) Shanghai, and is affiliated with the NYU-East China Normal University Institute of Mathematical Sciences and the Shanghai Center for Data Science. His research focuses on mean field games & control, numerical methods and machine learning.