TorchArrow looks pretty cool:<p>TorchArrow is a machine learning preprocessing library over batch data, providing performant and Pandas-style easy-to-use API for model development. Currently it provides a Python DataFrame that allows extensible UDFs with Velox, with the following features:<p>- Seamless handoff with PyTorch or other model authoring, such as Tensor collation and easily plugging into PyTorch DataLoader and DataPipes
- Zero copy for external readers via Arrow in-memory columnar format
- Multiple execution runtimes support:
- High-performance C++ UDF support with vectorization