TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Let's make the Emacs GC safe and iterative

208 pointsby nochover 7 years ago

8 comments

kazinatorover 7 years ago
It alrady handles long lists without recursion:<p><pre><code> case Lisp_Cons: { register struct Lisp_Cons *ptr = XCONS (obj); if (CONS_MARKED_P (ptr)) break; CHECK_ALLOCATED_AND_LIVE (live_cons_p); CONS_MARK (ptr); &#x2F;* If the cdr is nil, avoid recursion for the car. *&#x2F; if (EQ (ptr-&gt;u.s.u.cdr, Qnil)) { obj = ptr-&gt;u.s.car; cdr_count = 0; goto loop; } mark_object (ptr-&gt;u.s.car); obj = ptr-&gt;u.s.u.cdr; cdr_count++; if (cdr_count == mark_object_loop_halt) emacs_abort (); goto loop; } </code></pre> The &quot;goto loop&quot; chases the <i>cdr</i> pointer iteratively. So with regard to list structure, this will only blow the stack if you have deep <i>car</i> recursion. When conses are used to represent nested lists, you don&#x27;t get that much depth on the CAR side.<p>I&#x27;m just wondering what is the exact scenario, involving what structure.<p>Of course there can be other kinds of linked objects. E.g. a huge graph structure, where the GC happens to traverse a very long path through the graph before bottoming out somewhere.<p>BTW, &quot;[i]f the cdr is nil, avoid recursion for the car.&quot; Initial reaction: good frickin&#x27; idea and I&#x27;m stealing it immediately. Though, wait, not sure what it buys you in practical terms; who the heck builds a deep structure of conses where the linkage is through <i>car</i> and the <i>cdr</i>-s are nil.<p>A better test might be for the <i>cdr</i> being any atom at all. Though obviously some kinds of atoms are structures with pointers that can have depth behind them, many kinds of atoms are shallow. If a <i>cons</i> has any atom whatsoever as its <i>cdr</i>, then maybe we should recurse on the <i>cdr</i> and tail loop on the <i>car</i>.<p>Analysis required.
评论 #16508428 未加载
评论 #16507693 未加载
mav3r1ckover 7 years ago
Interesting. So that&#x27;s why emacs whigs out sometimes when trying to open very large files: because it has a recursive GC that has trouble when it runs out of stack space. Assuming I read the first statement correctly.<p>Seems like a welcome change. I&#x27;m not too well versed with the internals, but does 4.6MB seem reasonable for a memory footprint? I mean, compared to Slack, not bad.<p>&gt; <i>The naive version of this scheme needs about 4.6MB of overhead on my current 20MB Emacs heap, but it should be possible to reduce the overhead significantly by taking advantage of the block allocation we do for conses and other types --- we can put whole blocks on the queue instead of pointers to individual block parts, so we can get away with a much smaller queue.</i>
评论 #16506500 未加载
评论 #16505311 未加载
评论 #16505359 未加载
评论 #16506517 未加载
kazinatorover 7 years ago
&gt; <i>We need to fix GC being deeply recursive once and for all. Tweaking stack sizes on various platforms and trying to spot-fix GC for the occasional deeply recursive structure is annoying. Here&#x27;s my proposal:</i><p>There is still a default stack size limit of 8 megabytes on Linux, even though consumer machines nowadays have RAM measured in double-digit gigabytes.<p>On an Ubuntu 16.x machine:<p><pre><code> $ ulimit -a | grep stack stack size (kbytes, -s) 8192 $ free -h total used free shared buff&#x2F;cache available Mem: 15G 4.1G 2.8G 600M 8.7G 10G Swap: 14G 626M 13G </code></pre> This 8192 kbytes has not changed in at least 15 years, unlike that 15G. Slight disconnect there.<p>This 8192 is just the soft limit; applications can raise it up to the root-imposed hard limit.<p>Unfortunately, that tends to be only 4X larger than the 8192:<p><pre><code> $ ulimit -s 32769 bash: ulimit: stack size: cannot modify limit: Operation not permitted $ ulimit -s 32768 $ # OK </code></pre> Clearly, the hard limit needs revision too.<p>Threads are another issue; if you have a &quot;can run in any thread&quot; garbage collector, it has to live within a thread stack. You can&#x27;t have huge stacks for each of a large number of threads. A thread&#x27;s stack cannot be temporarily extended; there may be something in the address space preventing it, since it&#x27;s just another mmap. (Don&#x27;t think Emacs has this problem. :)
评论 #16506795 未加载
cwzwarichover 7 years ago
Why don&#x27;t they just use the Deutsch-Schorr-Waite algorithm to store the mark stack in the heap itself with pointer-reversal tricks? This would remove the additional overhead.
评论 #16507385 未加载
评论 #16506105 未加载
评论 #16505247 未加载
评论 #16509812 未加载
评论 #16508830 未加载
评论 #16505419 未加载
评论 #16505377 未加载
dawgover 7 years ago
Did sth. similar a while ago for our not so great D GC. <a href="https:&#x2F;&#x2F;github.com&#x2F;dlang&#x2F;druntime&#x2F;pull&#x2F;1073" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;dlang&#x2F;druntime&#x2F;pull&#x2F;1073</a>
kazinatorover 7 years ago
I wonder if a hybrid approach is possible. Run GC recursively. Arrange for the segfault to run on an alternative stack (sigaltstack). If a segfault occurs due to insufficient stack space, then do the rest of the GC using the &quot;queue draining loop&quot; approach, within the small amount of alternative stack memory. Then longmp to some point in the original GC to bail out the original stack.
评论 #16506645 未加载
评论 #16517795 未加载
评论 #16506813 未加载
olliejover 7 years ago
Super dumb question: why not just use a queue, and visit the objects iteratively?<p>Eg each visited object appends&#x2F;prepends its children to the queue of objects to visit, and you just repeatedly drain until you have an empty list.<p>That’s what I had to do many years ago for the GC I was working on.<p>Then any special cases (like the cons case referenced elsewhere) just becomes performance optimization rather than “don’t crash” correctness
gnufxover 7 years ago
This talks about possible generational GC. Incremeantal&#x2F;generational was available many years ago with a port to the Boehm collector. I never understood rejecting it and generally spending effort on a bespoke collector. (I don&#x27;t remember whether the Ravenbrook system was available at the time, or if it was unsuitable for some reason.)