TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Porting Libyaml to Safe Rust: Some Thoughts

127 点作者 agluszak超过 1 年前

8 条评论

jvanderbot超过 1 年前
&gt; The interesting takeaway is this: For all of the different roles that yaml_string_t takes on, it turns out there is an abstraction in the Rust standard library that matches each use case precisely. At all points in the libyaml source code where strings are used, the use case maps directly to one of the several many ways of dealing with strings in Rust. Phew<p>Reminder! Our forefathers knew and loved ownership concepts, but lacked to tools to do anything but mentally map them. We stand on the shoulders of giants.
vlovich123超过 1 年前
Great writeup. This part stuck out to me:<p>&gt; The justification for &quot;unsafe&quot; C and C++ libraries is often that some performance is left on the table with safe Rust, primarily due to bounds checking. On modern hardware this justification has always seemed a bit suspicious to me, and I think this is another piece of (circumstantial) evidence that it often just doesn&#x27;t matter in practice<p>That&#x27;s too strong a statement even though the sentiment is correct. Most software will not see a bottleneck because of bounds checking because it&#x27;s not in the fast path. All we can say in this case is that the work that libyaml was doing is not impacted by bounds checking. It&#x27;s not because of hardware but because of software inefficiencies &#x2F; problem being solved isn&#x27;t impacted by bounds checking - most likely libyaml is not running at full speed on the underlying hardware and thus any bounds checking is just noise because there&#x27;s so much slack already.<p>However, I have definitely encountered situations in very specific circumstances where bounds checking is a bottleneck when you&#x27;re trying to actually run at hardware speeds and unsafe is required to recover performance - of course I benchmarked very very carefully to the specific unsafe that&#x27;s the problem. I&#x27;ll also give a shoutout to the assume crate which I&#x27;ve found a much nicer mechanism to cause the compiler to elide bounds checking in many cases when building optimized while keeping it for debug builds &amp; validating your assertion is correct. The annoying bit is that you can&#x27;t ask the Rust compiler to elide all bounds check for a benchmark to validate the impact of the bounds checking.<p>Remember - on modern hardware your CPU is capable of executing ~4-6 billion operations &#x2F; s, process main memory at ~100GB&#x2F;s, and access disk at millions of times &#x2F; s at throughputs of ~1-2 GB&#x2F;s. This is all consumer grade hardware. Typical software applications are not typically targeting to try to run at full utilization but instead try to offer reasonable performance with a more straightforward solution that can be supported as cheaply as possible maintenance wise because the bottleneck cost is typically the programmer&#x27;s time rather than how fast the software can be made to run.
评论 #39306153 未加载
评论 #39306234 未加载
评论 #39308243 未加载
评论 #39306046 未加载
评论 #39306357 未加载
AceJohnny2超过 1 年前
Tangential C style survey... For code like this:<p><pre><code> if (some_function(some_memory) != ERROR) { &#x2F;&#x2F; ... } else { goto cleanup; } </code></pre> This approaches what I dub &quot;Happy Path Indentation Syndrome&quot; (HPIS), where the &quot;normal&quot;&#x2F;&quot;happy path&quot; functionality of the code happens under the `&#x2F;&#x2F; ...` inside the passing scope.<p>If you have several such functions that can each fail, this approach means your &quot;happy path&quot; follows the deepening scope &#x2F; indentation.<p>Instead, I much prefer styling it like this (assuming, obviously, that the &quot;happy path&quot; happens always):<p><pre><code> if (some_function(some_memory) == ERROR) { goto cleanup; } &#x2F;&#x2F; ... </code></pre> How do people approach this kind of situation?
评论 #39308271 未加载
评论 #39312485 未加载
评论 #39308254 未加载
评论 #39314526 未加载
viraptor超过 1 年前
This makes me happy. I&#x27;ve just done a review of yaml parsers available from Ruby recently to improve the error reporting. Unfortunately libyaml is just bad there - many indentation issues at the end of the document will report errors in line 2 with a completely lost context. Unfortunately x2, there&#x27;s no real alternative.<p>I ended up parsing with libyaml and when something fails, parsing again using node and displaying errors through <a href="https:&#x2F;&#x2F;github.com&#x2F;eemeli&#x2F;yaml">https:&#x2F;&#x2F;github.com&#x2F;eemeli&#x2F;yaml</a> It&#x27;s terrible but it works better. I hope to be able to use libyaml-safer in the future if it does improve on errors.<p>Also, having found an asserting case for libyaml in the past, I&#x27;m not trusting it <i>that</i> much.
wyldfire超过 1 年前
&gt; In order to compare the performance of unsafe-libyaml and the safe Rust port, I found a relatively large YAML document (~700 KiB) and built a very simple benchmark using Criterion.<p>Is criterion environment-aware enough to be able to purge&#x2F;drop the filesystem cache? It might be interesting to see how the performance looks with a cold cache.<p>&gt; What these numbers do indicate is that the port to safe, idiomatic Rust does not produce meaningfully slower code.<p>I think that&#x27;s reasonable and I would even suggest that Rust&#x27;s value proposition is somewhat predicated on that. A c2rust port like the baseline you started with is probably best as a working crutch from which someone can rewrite a bit here &#x2F; a bit there.
评论 #39306654 未加载
评论 #39306302 未加载
_benj超过 1 年前
&gt; Mercifully, C does not have exceptions<p>This made me smile, because while yes, C just segment faults. I think there’s a way to handle segmentation fault? Or is it just the OS that kills the process when it tries to access an invalid memory address?
评论 #39307793 未加载
评论 #39312514 未加载
wredue超过 1 年前
I find that you can usually safely disregard the opinion of people who state “&lt;thing&gt; probably doesn’t matter in practice.”<p>Measure yourselves. Maybe it does. Maybe it doesn’t matter. It doesn’t change that this “it probably doesn’t matter in practice” sentiment is the primary reason a typical web request today is in the order of 20-30x slower than it should be.<p>For this matter, if you’re using Libyaml to parse one yaml file, you’re probably fine. If you’re AWS and use libyaml, you’re definitely going to feel this change.
评论 #39307192 未加载
评论 #39316695 未加载
评论 #39307569 未加载
rurban超过 1 年前
You cannot call it safer libyaml, when it still allows all unsafe yaml features to happen, creating and destroying objects at will, or calling custom code. A safe yaml, such as my YAML::Safe library, has a white-list of allowed classes, and disables all unsafe features by default.<p>The safe variant should use libsyck btw, which didn&#x27;t implement all the new unsafe yaml features.
评论 #39313512 未加载
评论 #39312480 未加载