RISELab, the successor to the U.C. Berkeley group that created Apache Spark, is hatching a project that could replace Spark—or at least displace it for key applications.
Ray is a distributed framework designed for low-latency real-time processing, such as machine learning. Created by two doctoral students at RISELab, Philipp Moritz and Robert Nishihara, it works with Python to run jobs either on a single machine or distributed across a cluster, using C++ for components that need speed.
The main aim for Ray, according to an article at Datanami, is to create a framework that can provide better speeds than Spark. Spark was intended to be faster than what it replaced (mainly, MapReduce), but it still suffers from design decisions that make it difficult to write applications with “complex task dependencies” because of its internal synchronization mechanisms.
On the other hand, Ray has as little state as possible across the cluster. Where there needs to be a record of the system’s state, it’s kept in a central Redis server. The server tracks info like which jobs are located on what machines, but it doesn’t actually contain any data associated with a given job. The framework also makes use of an immutable object model; any objects that can be made immutable don’t need to be synchronized across the cluster, which saves yet more time.
Python scripts submit and execute jobs, and Ray uses Python’s syntax features to express how objects and jobs operate. If you decorate a function with @ray.remote
, that indicates the function can be executed asynchronously across the cluster. When the function is run, it immediately returns an object ID, which can be looked up later to obtain any finished result generated by the function. Ray’s documentation shows how Python’s list comprehensions can be combined to run a series of functions and return the results automatically.
Though Ray is in a pre-alpha state, it’s clearly built to support machine learning as a main duty. Key examples in the documentation include hyperparameter optimization (a common workload with machine learning frameworks), and training an AI network to play Pong. There are also details on how to use Ray with TensorFlow, including tips on how to make the best of Ray’s remote object model with the deep learning system.