For the last year I’ve been developing Hyperparam — a collection of small, fast, dependency-free open-source libraries designed for data scientists and ML engineers to actually look at their data.<p>- Hyparquet: Read any Parquet file in browser/node.js<p>- Icebird: Explore Iceberg tables without needing Spark/Presto<p>- HighTable: Virtual scrolling of millions of rows<p>- Hyparquet-Writer: Export Parquet easily from JS<p>- Hyllama: Read llama.cpp .gguf LLM metadata efficiently<p>CLI for viewing local files: npx hyperparam dataset.parquet<p>Example dataset on Hugging Face Space: <a href="https://huggingface.co/spaces/hyperparam/hyperparam?url=https%3A%2F%2Fhuggingface.co%2Fdatasets%2Fglaiveai%2Freasoning-v1-20m%2Fblob%2Frefs%2Fconvert%2Fparquet%2Fdefault%2Ftrain%2F0000.parquet" rel="nofollow">https://huggingface.co/spaces/hyperparam/hyperparam?url=http...</a><p>No cloud uploads. No backend servers. A better way to build frontend data applications.<p>GitHub: <a href="https://github.com/hyparam">https://github.com/hyparam</a>
Feedback and PRs welcome!
Though these tools might be interesting, I wish they had called this something else. This isn't at all related to the concept of hyperparameters which people commonly refer to as hyperparams. And in their copy, the only reference to hyperparameters seems to be misusing the term.<p>> This stems from an industry-wide realization that model performance is ultimately bounded by data quality, not just model architecture or hyperparameters.<p>Generally we think of model architecture + weights (parameters) as making up the model itself, and hyperparam(s|eters) are the more relevant to how one arrives at those weights -- and for this reason are more relevant to the efficacy of training than the performance of the resultant model.
That's a lot of names for a bunch of tools that do a single task each.<p>What I would really benefit of is a hypothetical LLM chat app that is focused on data migration or processing pipelines.