RL Replicate - Maximum a Posteriori Policy Optimisation

Published:

In this project, we have successfully replicated the results of the Maximum a Posteriori Policy Optimization (MPO) method proposed by DeepMind. To enhance comprehension, we have created an informative slide deck that thoroughly explains the implementation and outcomes of our work. Please explore the slides to gain a comprehensive understanding of our replication project.

Slides Link | Report Link