Reinforcement Learning for Bond Allocation in Global Pension Fund Portfolios

Reinforcement Learning for Bond Allocation in Global Pension Fund Portfolios

In this project, supervised by Tredje AP-fonden (AP3), the analysts Edvin Gunnarsson, Victor Mikkelsen and Carolina Oker-Blom explore whether Reinforcement Learning (RL) can optimize global bond allocations by dynamically balancing duration and credit risk. Using the Proximal Policy Optimization (PPO) algorithm combined with PCA-based dimensionality reduction, they develop an agent designed to adapt to shifting market regimes and complex macroeconomic conditions.

The analysis finds that the RL model materially outperforms a broad global bond benchmark across key risk-adjusted metrics. Notably, the agent learned to rotate defensively into cash during the 2022 rate-hiking cycle and re-risk into credit once conditions stabilized, behavior consistent with institutional flight-to-quality strategies, but discovered entirely through training. This suggests that reinforcement learning, when integrated with classical risk metrics like duration and convexity, can effectively support semi-automated trading in fixed-income portfolios.

Leave a Reply

Your email address will not be published. Required fields are marked *