TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Efficient Data Formats for GPT

6 点作者 nikaspran大约 2 年前

2 条评论

nikaspran大约 2 年前
I ran a small test comparing different data serialization formats for use with GPT models (and possibly other LLMs). This is obviously very limited but it was striking how much of a difference switching from JSON to something like YAML could be.<p>I wonder if we might also see LLM specific data serialisation formats in the future, to make use of tokenization in the most efficient manner and enhance the generative capability of the models.
评论 #35468597 未加载
emrah大约 2 年前
Why does the serialization need to be text based? Why not binary formats? Or use sqlite or other db for storage and retrieval? That might also help with not having to read all the data into memory at once (although it would be slower to run)
评论 #35464027 未加载