TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Why is parsing still inconvenient in 2023?

2 点作者 substation13大约 2 年前
When storing and sharing structured data, engineers typically encode it in an intermediate structure like JSON, YAML, XML, TOML, etc...<p>Often, these are often not a good fit for the problem at hand.<p>However, no one seems to be writing parsers for their own custom formats. If you did, it would certainly get a few strange looks in code review!<p>But why is this? Why is working with JSON etc. still so much easier than writing quick parsers?<p>Why haven&#x27;t common parsing techniques been streamlined to the point where this is the easiest path?

3 条评论

DemocracyFTW2大约 2 年前
I don&#x27;t believe in the premise of your idea. Most of the time stuff can be done using standard data structures—things like numbers, strings, lists and maps of those and so on. Intermediate structures like JSON are only needed for storage and transmission. Inasfar as the need for storage and transmission in turn necessitates the same recurrent task of data serialization, it totally makes sense to reuse tools that have been around for a long time and been optimized for most use cases—so why re-invent databases, JSON parsers and entire wire protocols when more or less optimized tools of the sort are already there? Google wrote protobufs as a replacement for JSON, and while it does offer things beyond JSON like data typing, the appraisal is not unanimous. It is also hard to replace a mature DBMS like Postgres or SQLite with a from-scratch, purpose-made solution without failing on every single useful metric like throughput, feature completeness and reliability.<p>Personally I too think parsing should be easier in this day and age; I believe Raku (Perl 6) has made meaningful strides in that direction. Other than that, I feel parsing is somewhat over- and lexing is somewhat underrated, if anything. In my experience lexing is really the step you want most to get data out of a byte sequence, and I agree that <i>that</i> should and could be much easier. FWIW JavaScript&#x27;s RegExes recently obtained the &#x27;sticky flag&#x27; which is ultra-beneficial for lexing. Not sure why that bit took so long.
surprisetalk大约 2 年前
Most of the time, parsing isn&#x27;t the problem you want to be solving.<p>People use JSON because they need to send information over a wire, and JSON serializers are abundant. I agree that this is problematic for a bunch of different reasons:<p>[1] <a href="https:&#x2F;&#x2F;taylor.town&#x2F;json-considered-harmful" rel="nofollow">https:&#x2F;&#x2F;taylor.town&#x2F;json-considered-harmful</a><p>But note that this problem has been solved many, many times:<p>[2] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Comparison_of_data-serialization_formats" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Comparison_of_data-serializati...</a><p>Formats like MessagePack and Cap&#x27;n_Proto have a lot of nice properties.<p>Writing parsers is not easy. And it&#x27;s especially not easy when you have a custom format that different people want to do different things with.<p>---<p>Btw, I&#x27;ve tried out pretty much every parsing library in Rust, Typescript, and Haskell.<p>Elm&#x27;s parsing library is the only one I enjoy using:<p>[3] <a href="https:&#x2F;&#x2F;package.elm-lang.org&#x2F;packages&#x2F;elm&#x2F;parser&#x2F;latest" rel="nofollow">https:&#x2F;&#x2F;package.elm-lang.org&#x2F;packages&#x2F;elm&#x2F;parser&#x2F;latest</a><p>I think nearley.js has a cool interface but poor execution:<p>[4] <a href="https:&#x2F;&#x2F;nearley.js.org" rel="nofollow">https:&#x2F;&#x2F;nearley.js.org</a>
评论 #35454035 未加载
pestatije大约 2 年前
Parser generators have been around for a very long time...i think the difficulty is in defining the format, at which point it is easier to go the standard way