科技回声

8 条评论

jvanderbot超过 1 年前

> The interesting takeaway is this: For all of the different roles that yaml_string_t takes on, it turns out there is an abstraction in the Rust standard library that matches each use case precisely. At all points in the libyaml source code where strings are used, the use case maps directly to one of the several many ways of dealing with strings in Rust. PhewReminder! Our forefathers knew and loved ownership concepts, but lacked to tools to do anything but mentally map them. We stand on the shoulders of giants.

vlovich123超过 1 年前

Great writeup. This part stuck out to me:> The justification for "unsafe" C and C++ libraries is often that some performance is left on the table with safe Rust, primarily due to bounds checking. On modern hardware this justification has always seemed a bit suspicious to me, and I think this is another piece of (circumstantial) evidence that it often just doesn't matter in practiceThat's too strong a statement even though the sentiment is correct. Most software will not see a bottleneck because of bounds checking because it's not in the fast path. All we can say in this case is that the work that libyaml was doing is not impacted by bounds checking. It's not because of hardware but because of software inefficiencies / problem being solved isn't impacted by bounds checking - most likely libyaml is not running at full speed on the underlying hardware and thus any bounds checking is just noise because there's so much slack already.However, I have definitely encountered situations in very specific circumstances where bounds checking is a bottleneck when you're trying to actually run at hardware speeds and unsafe is required to recover performance - of course I benchmarked very very carefully to the specific unsafe that's the problem. I'll also give a shoutout to the assume crate which I've found a much nicer mechanism to cause the compiler to elide bounds checking in many cases when building optimized while keeping it for debug builds & validating your assertion is correct. The annoying bit is that you can't ask the Rust compiler to elide all bounds check for a benchmark to validate the impact of the bounds checking.Remember - on modern hardware your CPU is capable of executing ~4-6 billion operations / s, process main memory at ~100GB/s, and access disk at millions of times / s at throughputs of ~1-2 GB/s. This is all consumer grade hardware. Typical software applications are not typically targeting to try to run at full utilization but instead try to offer reasonable performance with a more straightforward solution that can be supported as cheaply as possible maintenance wise because the bottleneck cost is typically the programmer's time rather than how fast the software can be made to run.

评论 #39306153 未加载

评论 #39306234 未加载

评论 #39308243 未加载

评论 #39306046 未加载

评论 #39306357 未加载

AceJohnny2超过 1 年前

Tangential C style survey... For code like this:<pre><code> if (some_function(some_memory) != ERROR) { // ... } else { goto cleanup; } </code></pre> This approaches what I dub "Happy Path Indentation Syndrome" (HPIS), where the "normal"/"happy path" functionality of the code happens under the `// ...` inside the passing scope.If you have several such functions that can each fail, this approach means your "happy path" follows the deepening scope / indentation.Instead, I much prefer styling it like this (assuming, obviously, that the "happy path" happens always):<pre><code> if (some_function(some_memory) == ERROR) { goto cleanup; } // ... </code></pre> How do people approach this kind of situation?

评论 #39308271 未加载

评论 #39312485 未加载

评论 #39308254 未加载

评论 #39314526 未加载

viraptor超过 1 年前

This makes me happy. I've just done a review of yaml parsers available from Ruby recently to improve the error reporting. Unfortunately libyaml is just bad there - many indentation issues at the end of the document will report errors in line 2 with a completely lost context. Unfortunately x2, there's no real alternative.I ended up parsing with libyaml and when something fails, parsing again using node and displaying errors through <a href="https://github.com/eemeli/yaml">https://github.com/eemeli/yaml</a> It's terrible but it works better. I hope to be able to use libyaml-safer in the future if it does improve on errors.Also, having found an asserting case for libyaml in the past, I'm not trusting it that much.

wyldfire超过 1 年前

> In order to compare the performance of unsafe-libyaml and the safe Rust port, I found a relatively large YAML document (~700 KiB) and built a very simple benchmark using Criterion.Is criterion environment-aware enough to be able to purge/drop the filesystem cache? It might be interesting to see how the performance looks with a cold cache.> What these numbers do indicate is that the port to safe, idiomatic Rust does not produce meaningfully slower code.I think that's reasonable and I would even suggest that Rust's value proposition is somewhat predicated on that. A c2rust port like the baseline you started with is probably best as a working crutch from which someone can rewrite a bit here / a bit there.

评论 #39306654 未加载

评论 #39306302 未加载

_benj超过 1 年前

> Mercifully, C does not have exceptionsThis made me smile, because while yes, C just segment faults. I think there’s a way to handle segmentation fault? Or is it just the OS that kills the process when it tries to access an invalid memory address?

评论 #39307793 未加载

评论 #39312514 未加载

wredue超过 1 年前

I find that you can usually safely disregard the opinion of people who state “<thing> probably doesn’t matter in practice.”Measure yourselves. Maybe it does. Maybe it doesn’t matter. It doesn’t change that this “it probably doesn’t matter in practice” sentiment is the primary reason a typical web request today is in the order of 20-30x slower than it should be.For this matter, if you’re using Libyaml to parse one yaml file, you’re probably fine. If you’re AWS and use libyaml, you’re definitely going to feel this change.

评论 #39307192 未加载

评论 #39316695 未加载

评论 #39307569 未加载

rurban超过 1 年前

You cannot call it safer libyaml, when it still allows all unsafe yaml features to happen, creating and destroying objects at will, or calling custom code. A safe yaml, such as my YAML::Safe library, has a white-list of allowed classes, and disables all unsafe features by default.The safe variant should use libsyck btw, which didn't implement all the new unsafe yaml features.

Porting Libyaml to Safe Rust: Some Thoughts

8 条评论

Porting Libyaml to Safe Rust: Some Thoughts

8 条评论