TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Fixing the Loading in Myst IV: Revelation

264 pointsby davikr5 months ago

20 comments

mananaysiempre5 months ago
I’ve just read both parts of the article and I still feel like I’m left with more questions than answers.<p>The game is bottlenecked on memcpy so hard it takes two seconds to load each time? On a modern machine with double-digit GB&#x2F;s RAM bandwidth and single-digit GB&#x2F;s SSD bandwidth, when the game was released on two DVDs and thus can’t have more than couple dozen GB of assets total[1]. How? OK, they’re doing a memcpy per image row, that’s not nice and can probably cost you an order of magnitude or so, and the assets are JPEG-compressed so it’s another order of magnitude to copy around uncompressed pixels, but still, <i>how?</i><p>Furthermore, if it really is bottlenecked on memcpy, why does running on a modern machine not improve things? I almost want to think there’s a fixed amount of per-frame work hardcoded somewhere, and loading DDS is just accounted for incorrectly.<p>[1] In fact, a screenshot in part 1 shows data.m4b taking up 1.4GB, and the rest of the files shown are either video, sound, or small.
评论 #42409994 未加载
评论 #42408438 未加载
评论 #42439275 未加载
评论 #42413460 未加载
SideQuark5 months ago
Unfortunately the author and the paper he links apply alpha premultiply to the gamma compressed image. To be correct, this should be done in a linear colorspace. His solution will make some color edge combos get halos.<p>Basically, alpha in all formats I’ve seen is stored linear, but colors are gamma compressed (sRGB, HDR stuff, etc.). If you apply alpha premultiply, then linearize, you’ve misapplied alpha. If you ignore linearizing (as even this author shows), you get immediate black halos since your blend is effectively multiplying colors, not adding them.
评论 #42439171 未加载
shiomiru5 months ago
&gt; As any good programmer knows, division is slow, it’s a serializing instruction, and we want to avoid it as much as possible. The favourite programmer tricks to avoid division are to use bit shifts (for division by multiples of two) or flip it into a multiplication — for example, to multiply by 0.333 instead of dividing by 3. In this case though, we are dividing by the alpha, so we can’t know what number we will be dividing by in advance.<p>&gt; However, because the channels are 8-bit, we will only ever be dividing numbers from 1 to 255 (yes, some values will be zero — but we sure won’t be dividing by them then!) That means there are only about 65K possible combinations, so we can use another classic solution: a lookup table! This is a perfect place to use constexpr to bake the array directly into the compiled result.<p>Interestingly, when I benchmarked this same problem, three integer divisions would easily beat the LUT on my computer. Maybe because the it&#x27;s easier on the cache? (Or I did something wrong.)
评论 #42408127 未加载
评论 #42408759 未加载
评论 #42408664 未加载
评论 #42411692 未加载
评论 #42411506 未加载
feintruled5 months ago
I think any software engineer can identify with the feeling you get at the moment you do the first run of the solution you have implemented that you are 100% sure has to fix it only to find nothing has changed.
评论 #42410692 未加载
评论 #42410059 未加载
评论 #42439335 未加载
评论 #42410050 未加载
throwaway2845345 months ago
I really enjoyed the author&#x27;s technical deep-dive and approach to debugging performance issues. Mild spoilers for anyone who hasn&#x27;t played Riven, but the method for fixing Gehn&#x27;s faulty linking books is a perfect analogy for the author&#x27;s more counterintuitive performance optimizations.<p>While I don’t have a write-up as detailed as this one, I spent a month on a similar journey optimizing an animated ASCII art rasterizer. What started as an excuse to learn more about browser performance became a deep dive into image processing, WebGL, and the intricacies of the Canvas API. I’m proud of the results but I’ve annotated the source for a greater mind to squeeze another 5 or 10 FPS out of the browser.<p>Maybe it’s time to brush up on those WebGL docs again…<p>- [1] <a href="https:&#x2F;&#x2F;asciify.sister.software&#x2F;" rel="nofollow">https:&#x2F;&#x2F;asciify.sister.software&#x2F;</a><p>- [2] <a href="https:&#x2F;&#x2F;github.com&#x2F;sister-software&#x2F;asciify&#x2F;blob&#x2F;main&#x2F;Asciify.mts">https:&#x2F;&#x2F;github.com&#x2F;sister-software&#x2F;asciify&#x2F;blob&#x2F;main&#x2F;Asciify...</a>
EDEdDNEdDYFaN5 months ago
Very good read! love detailed explanations on the &quot;bad&quot; original code and steps taken toward improving it. a lot of it comes down to personal preference and the author did a good job at respecting what might have been an intentional design decision with their optimizations by making it all configurable
iforgotpassword5 months ago
Great writeup. A typical &quot;this shouldn&#x27;t be too hard&quot; story with yet another surprise around every corner. Seems familiar... :)<p>One thing I wondered is whether with that optimized loader library, is it even still necessary to do the DXT conversion at all? Sounds like mango and pixman could be fast enough already....
评论 #42439394 未加载
Cthulhu_5 months ago
Love seeing how the optimization parameters were different back then, that is, size constraints were more important than loading speeds, even though both drives and CPUs were much slower back then.<p>Ideally companies like this that make games keep all the original assets and make things like image format a build switch, for when the parameters change in the future. That said, back then they released on a DVD (I&#x27;m reading it would&#x27;ve taken 12 CDs otherwise), I don&#x27;t believe any higher capacity storage devices were in the pipeline yet at that point. That said, hard drives back then were around the 100 GB mark, so a multi-dvd release would&#x27;ve been doable.<p>Ironically nowadays, some games (like the FFVII Remakes) are on two disks again, an install and a run disk, despite them having a 50 or 100 GB capacity nowadays.
评论 #42408387 未加载
评论 #42411021 未加载
评论 #42439850 未加载
评论 #42408258 未加载
tomovo5 months ago
STB image didn&#x27;t get used in the end because some other library was faster but I think the author missed the possibility of #defining their own allocator using STBI_MALLOC (which could just return a pointer to an existing memory block).
评论 #42439422 未加载
tomcam5 months ago
&gt; the author explains they used a tool called Luke Stackwalker to profile the game<p>Can anyone confirm my memory that Microsoft had a tool called Luke Heapwalker in the mid-1980s, and that Lucasfilms demanded they change the name?
评论 #42406216 未加载
nitwit0055 months ago
&gt; In this profile, we can see that approximately 50% of the time is spent on WaitForSingleObject, but we know that is a part of the game’s normal rendering loop so we can dismiss it as background noise.<p>That&#x27;s not an entirely safe assumption. Even a single threaded game could wait on different handles at different points in its logic.
lbj5 months ago
My hat is off to this, I really appreciate how he documented every step he took. It&#x27;s lengthy but definitely worth the read.
account425 months ago
&gt; So we know that WaitForSingleObject is where the majority of CPU time should be spent during normal operation, and we can dismiss anything that appears in this first list as not the source of the problem.<p>This heuristic might have worked this time but I don&#x27;t think it&#x27;s great in general. System functions can be used for many different purposes and even the same use might be fine in one place and a bug in another. For example the game could have been unintentionally vsyncing many times during the loading process, i.e. to update a progress bar. And no, that&#x27;s not a purely hypothetical scenario.
zetafunction5 months ago
Great read! Though there is an unnecessary double map lookup in part 2: <a href="https:&#x2F;&#x2F;github.com&#x2F;tomysshadow&#x2F;M4Revolution&#x2F;blob&#x2F;094764c87aa4e3d5b822d18a23e6729b04b14236&#x2F;M4Revolution&#x2F;Ubi.cpp#L1341">https:&#x2F;&#x2F;github.com&#x2F;tomysshadow&#x2F;M4Revolution&#x2F;blob&#x2F;094764c87aa...</a>
rkagerer5 months ago
This is awesome (and very impressive)!<p>Two questions:<p>1. What tool was used to generate that &quot;Ange Albertini-inspired file format diagram&quot;?<p>2. Is there an emulator that would make this easy to play under Android?
评论 #42439484 未加载
kubb5 months ago
My manager takes one look at this and asks: so, in the end the effort was unsuccessful? No impact? That’s OK, let’s get you refocused on something productive :)
withinrafael5 months ago
Tangentially related, previous releases of the game were also hit by a DirectInput device enumeration regression that Microsoft (behind the scenes) refused to fix. (I haven&#x27;t checked the latest re-release.)<p><a href="https:&#x2F;&#x2F;github.com&#x2F;riverar&#x2F;IndirectInput">https:&#x2F;&#x2F;github.com&#x2F;riverar&#x2F;IndirectInput</a>
jobbr5 months ago
This guy Mysts.
brcmthrowaway5 months ago
What is this weird game?
评论 #42411682 未加载
评论 #42414905 未加载
ada19815 months ago
This was very impressive to read and while I don’t have the technical Knowledge to do this, it reminded me of “fixing” my mental health when Stanford Psychiatrists diagnosed me and said I’d be on pills the rest of my life, incurable.<p>Years later, after rebuilding my psyche from scratch, happy to report they were wrong.<p>But striking similarities, where the “professionals” just didn’t bother to solve a solvable problem.
评论 #42411335 未加载