Hi all, I'm one of the authors of Ray, thanks for all the comments and discussion! To add to the discussion, I'll mention a few conceptual things that have changed since we wrote the paper.<p>*Emphasis on the library ecosystem*<p>A lot of our focus is on building an ecosystem of libraries on top of Ray (much, but not all, of the focus is on machine learning libraries).
Some of these libraries are built natively on top of Ray such as Ray Tune for scaling hyperparameter search (<a href="http://tune.io" rel="nofollow">http://tune.io</a>), RLlib for scaling reinforcement learning (<a href="http://rllib.io" rel="nofollow">http://rllib.io</a>), Ray Serve for scaling model serving (<a href="http://rayserve.org/" rel="nofollow">http://rayserve.org/</a>), and RaySGD for scaling training (<a href="https://docs.ray.io/en/master/raysgd/raysgd.html" rel="nofollow">https://docs.ray.io/en/master/raysgd/raysgd.html</a>).<p>Some of the libraries are popular libraries on their own, which now integrate with Ray such as Horovod (<a href="https://eng.uber.com/horovod-ray/" rel="nofollow">https://eng.uber.com/horovod-ray/</a>), XGBoost (<a href="https://xgboost.readthedocs.io/en/latest/tutorials/ray.html" rel="nofollow">https://xgboost.readthedocs.io/en/latest/tutorials/ray.html</a>), and Dask for dataframes (<a href="https://docs.ray.io/en/master/dask-on-ray.html" rel="nofollow">https://docs.ray.io/en/master/dask-on-ray.html</a>). While Dask itself has similarities to Ray (especially the task part of the Ray API), Dask also has libraries for scaling dataframes and arrays, which can be used as part of the Ray ecosystem (more details at <a href="https://www.anyscale.com/blog/analyzing-memory-management-and-performance-in-dask-on-ray" rel="nofollow">https://www.anyscale.com/blog/analyzing-memory-management-an...</a>).<p>Many Ray users start using Ray for one of the libraries (e.g., to scale training or hyperparameter search) as opposed to just for the core system.<p>*Emphasis on serverless*<p>Our goal with Ray is to make distributed computing as easy as possible. To do that, we think the serverless direction, which allows people to just focus on their code and not on infrastructure, is very important. Here, I don't mean serverless purely in the sense of functions as a service, but something that would allow people to run a wide variety of applications (training, data processing, inference, etc) elastically in the cloud without configuring or thinking about infrastructure. There's a lot of ongoing work here (e.g., to improve autoscaling up and down with heterogeneous resource types). More details on the topic <a href="https://www.anyscale.com/blog/the-ideal-foundation-for-a-general-purpose-serverless-platform" rel="nofollow">https://www.anyscale.com/blog/the-ideal-foundation-for-a-gen...</a>.<p>If you're interested in this kind of stuff, consider joining us at Anyscale <a href="https://jobs.lever.co/anyscale" rel="nofollow">https://jobs.lever.co/anyscale</a>.