Robust Reinforcement Learning

Research Project by Calvin Seward

Caption: Camel in the desert

When I think of robustness, I think of camels¹. Camels thrive in unpredictable, varying environments. Be it hot or cold, sunny or windy, dry or rainy camels are prepared with their hump full of fatty deposits, their thick fur and distinctive nose and eyelashes. It would be nice if reinforcement learning methods were like camels, robust and adaptable.

Yet many reinforcements learning methods are quite fragile, and the resulting policies akin to the Devils Hole pupfish, a cute yet sensitive fish whose only habitat is an isolated, water-filled cavern in Nevada. As with the pupfish, when the world these policies live in changes, the policy’s performance quickly deteriorates. Yet for Reinforcement Learning to inform key functions at Zalando such as pricing, search and recommendations, it must be robust to changes in the world such as seasonal differences, missing data and shifts in user behavior. It must be robust like a camel, not fragile like the pupfish.

To this end, we are researching provably robust policies: policies which perform well even in the face of unexpected conditions allowing us to show more relevant and inspiring products to our customers. So next time you’re surfing on the website and see a recommendation you like, don’t forget to thank the robust camel who made it happen.

Foto source: pixabay

Here camel refers both to both Camelus bactrianus and the Dromedary (Camelus dromedarius) ↩︎