How Ray makes continuous learning accessible and easy to scale
The O’Reilly Data Show Podcast: Robert Nishihara and Philipp Moritz on a new framework for reinforcement learning and AI applications.
In this episode of the Data Show, I spoke with Robert Nishihara and Philipp Moritz, graduate students at UC Berkeley and members of RISE Lab. I wanted to get an update on Ray, an open source distributed execution framework that makes it easy for machine learning engineers and data scientists to scale reinforcement learning and other related continuous learning algorithms. Many AI applications involve an agent (for example a robot or a self-driving car) interacting with an environment. In such a scenario, an agent will need to continuously learn the right course of action to take for a specific state of the environment.
What do you need in order to build large-scale continuous learning applications? You need a framework with low-latency response times, one that is able to run massive numbers of simulations quickly (agents need to be able explore states within an environment), and supports heterogeneous computation graphs. Ray is a new execution framework written in C++ that contains these key ingredients. In addition, Ray is accessible via Python (and Jupyter Notebooks), and comes with many of the standard reinforcement learning and related continuous learning algorithms that users can easily call.
As Nishihara and Moritz point out, frameworks like Ray are also useful for common applications such as dialog systems, text mining, and machine translation. Here are some highlights from our conversation:
Tools for reinforcement learning
Ray is something we’ve been building that’s motivated by our own research in machine learning and reinforcement learning. If you look at what researchers who are interested in reinforcement learning are doing, they’re largely ignoring the existing systems out there and building their own custom frameworks or custom systems for every new application that they work on.
… For reinforcement learning, you need to be able to share data very efficiently, without copying it between multiple processes on the same machine, you need to be able to avoid expensive serialization and deserialization, and you need to be able to create a task and get the result back in milliseconds instead of hundreds of milliseconds. So, there are a lot of little details that come up.
… In fact, people often use MPI along with lower-level multi-processing libraries to build the communication infrastructure for their reinforcement learning applications.
Scaling machine learning in dynamic environments
I think right now when we think of machine learning, we often think of supervised learning. But a lot of machine learning applications are changing from making just one prediction to making sequences of decisions and taking sequences of actions in dynamic environments.
The thing that’s special about reinforcement learning is it’s not just the different algorithms that are being used, but rather the different problem domain that it’s being applied to: interactive, dynamic, real-time settings bring up a lot of new challenges.
… The set of algorithms actually goes even a little bit further. Some of these techniques are even useful in, for example, things like text summarization and translation. You can use these techniques that have been developed in the context of reinforcement learning to better tackle some of these more classical problems [where you have some objective function that may not be easily differentiable].
… Some of the classic applications that we have in mind when we think about reinforcement learning are things like dialogue systems, where the agent is one participant in the conversation. Or robotic control, where the agent is the robot itself and it’s trying to learn how to control its motion.
… For example, we implemented the evolution algorithm described in a recent OpenAI paper in Ray. It was very easy to port to Ray, and writing it only took a couple of hours. Then we had a distributed implementation that scaled very well and we ran it on up to 15 nodes.
Related resources:
- Why continuous learning is key to AI: A look ahead at the tools and methods for learning from sparse feedback
- Ray: A distributed execution framework for emerging AI applications: A Strata Data keynote by Michael Jordan
- Deep reinforcement learning for robotics: A 2016 Artificial Intelligence Conference presentation by Pieter Abbeel
- Cars that coordinate with people: A 2017 Artificial Intelligence Conference keynote by Anca Dragan)
- Introduction to reinforcement learning and OpenAI Gym
- Reinforcement learning explained