Lots of confusion derives from the best-of-breed parquet readers for Python and R residing in the Arrow packages, mostly because Arrow is (and does) a lot of things.<p>There's:<p>* arrow, a in-memory format for dataframe-shaped things supporting fast computation, zero-copy sharing, etc.<p>* arrow Feather V1, an on-disk format for dataframe-shaped things<p>* arrow IPC, a (de)serialization for arrow buffers and a protocol for sending/receiving to other processes.<p>* arrow Feather V2, an on-disk format that's basically the IPC serialization written to file[1]<p>* arrow Flight, a protocol for requesting/sending/receiving data to remote machines that's basically gRPC layered over the IPC format<p>* arrow DataFusion/Ballista, nascent system(s) for local/distributed query execution over arrow buffers<p>* other subprojects I'm surely forgetting<p>* a (very good) C++ Parquet reader [2]/[3] developed under the auspices of the project<p>* libraries in / bindings to many languages, including R and Python, supporting interaction with (subsets of) the above.<p>It's only the last piece that's exposed to most data-science-y users, and thus identified with the 'arrow' name. Since those libraries are also very good, and hiding their abstractions well, those users are free to use the functionality relevant to them, be it dealing with parquet, feather, etc. without needing to understand how they work.<p>Not that this is a criticism of the project, or those users! Arrow encompasses lots of different functionality, which enables it to provide different things to different people. As a result, though, 'Arrow' connotes lots of different things (and different _sorts_ of things) to different users, which can cause some confusion if terms aren't fully specified, or even a bit misunderstood<p>[1] <a href="https://stackoverflow.com/a/67911190/881025" rel="nofollow">https://stackoverflow.com/a/67911190/881025</a>
[2] <a href="https://github.com/apache/parquet-cpp" rel="nofollow">https://github.com/apache/parquet-cpp</a>
[3] <a href="https://github.com/apache/arrow/tree/master/cpp" rel="nofollow">https://github.com/apache/arrow/tree/master/cpp</a>