Hey all,
If you haven't seen the Oxen project yet, we have been building an open source unstructured data version control tool.<p>We were inspired by the idea of making large machine learning datasets living & breathing assets that people can collaborate on, rather than the static ones of the past. Lately we have been working hard on optimizing the underlying Merkle Trees and data structures with in Oxen.ai and just released v0.19.4 which provides a bunch of performance upgrades and stability to the internal APIs.<p>To put it all to the test, we decided to benchmark the tool on the 1 million+ images in the classic ImageNet dataset.<p>The TLDR is Oxen.ai is faster than raw uploads to S3, 13x faster than git-lfs, and 5x faster than DVC. The full breakdown can be found here.<p><a href="https://docs.oxen.ai/features/performance" rel="nofollow">https://docs.oxen.ai/features/performance</a><p>If you are in the ML/AI community, or rust aficionados, would love to get your feedback on both the tool and the codebase. We would love some community contribution when it comes to different storage backends and integrations into other data tools.