TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

My First Kernel Module: A Debugging Nightmare

133 pointsby ksmlover 4 years ago

16 comments

Taniwhaover 4 years ago
So a story: I&#x27;ve been a kernel hack since Unix V6, made a living doing it one way or another for over half my life ... learning to think about concurrency, time, interrupts, race conditions etc is hard, very hard - I got pretty good at it ... but then my career took a diversion, I designed chips for a decade or so, everything is concurrency, at the lowest levels .... after a while I came back to doing kernel stuff and found that with this new background all that hard stuff was trivial and obvious.<p>Mostly you just have to steep your brain in it for long enough
评论 #25156909 未加载
评论 #25157657 未加载
评论 #25156875 未加载
评论 #25155371 未加载
cesarbover 4 years ago
&gt; However, printk can block (while allocating memory)<p>No, printk() is magic. It can be called even in NMI context, which is a worse place. Quoting <a href="https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;800946&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lwn.net&#x2F;Articles&#x2F;800946&#x2F;</a>, &quot;[...] kernel code must be able to call printk() from any context. Calls from atomic context prevent it from blocking; calls from non-maskable interrupts (NMIs) can even rule out the use of spinlocks. [...]&quot;
评论 #25154999 未加载
评论 #25157080 未加载
lallysinghover 4 years ago
EBPF is honestly the first thing to try <i>before</i> writing a module.<p>I&#x27;m glad to see you used a VM. That&#x27;s the first step in the right direction. Others have mentioned that you should&#x27;ve used printk(), which is true.<p>I&#x27;ll mention that you can also run the kernel in a debugger: <a href="https:&#x2F;&#x2F;www.kernel.org&#x2F;doc&#x2F;html&#x2F;latest&#x2F;dev-tools&#x2F;gdb-kernel-debugging.html" rel="nofollow">https:&#x2F;&#x2F;www.kernel.org&#x2F;doc&#x2F;html&#x2F;latest&#x2F;dev-tools&#x2F;gdb-kernel-...</a>
评论 #25155031 未加载
评论 #25157140 未加载
megousover 4 years ago
Linux has some debug options that could have probably helped here. It&#x27;s a good idea to enable them when developing new code.<p><a href="https:&#x2F;&#x2F;megous.com&#x2F;dl&#x2F;tmp&#x2F;b6e8f550de4539a8.png" rel="nofollow">https:&#x2F;&#x2F;megous.com&#x2F;dl&#x2F;tmp&#x2F;b6e8f550de4539a8.png</a>
评论 #25155558 未加载
ksmlover 4 years ago
Hi HN, this was my first attempt at writing any sort of kernel code. I would love to hear your thoughts on this experience and on the fixes I applied, especially from anyone with more Linux experience than me :)
评论 #25153739 未加载
评论 #25153889 未加载
评论 #25153907 未加载
noncomlover 4 years ago
I see the world “nightmare” used a lot in this attic ale.<p>I wonder if I am the only one that loves debugging difficult&#x2F;weird problems. It’s something like trying to solve a puzzle. And knowing that the system will never deceive me(it will not be the system’s fault if I get deceived), and that a perfectly reasonable good explanation exists for what I observe helps me do not give up.
评论 #25156004 未加载
评论 #25155858 未加载
评论 #25157415 未加载
sweetteaover 4 years ago
You probably already did this, but for the audience: one of the best ways to make sure you&#x27;re using a function reasonably is to use elixir.bootlin.com to look at other uses and make sure you&#x27;re using the function similarly. For instance, check out <a href="https:&#x2F;&#x2F;elixir.bootlin.com&#x2F;linux&#x2F;latest&#x2F;A&#x2F;ident&#x2F;for_each_process" rel="nofollow">https:&#x2F;&#x2F;elixir.bootlin.com&#x2F;linux&#x2F;latest&#x2F;A&#x2F;ident&#x2F;for_each_pro...</a> .
评论 #25154874 未加载
评论 #25156871 未加载
wyldfireover 4 years ago
My knee jerk reading this article and seeing a kernel module near &#x27;nodejs&#x27; was to grumble and say &quot;wtf they clearly didn&#x27;t need a kernel module for this&quot;. But upon reading deeper I see that accessing the kernel is kinda appropriate.<p>Regardless of whether you end up using eBPF or a .ko like you already have, you may have a yet simpler option. By leveraging the loader you can do an interposition trick with LD_PRELOAD to hook C library accesses. Maybe this is all you need in order to &quot;help students understand system calls such as open, close, dup2, fork, pipe, and others. &quot;<p>Just a suggestion. Carry on, good show.
egberts1over 4 years ago
Takes me back to the days of ATM device driver debugging. I’ve written 9 kernel drivers. All in all, a dedicated standalone terminal attached to the serial port of the target is still your best friend.
lhoursquentinover 4 years ago
Great post, also love what you are trying to do with C playground, this is awesome!<p>I&#x27;ve recently been trying to build something similar, visualizing forks&#x2F;exeve&#x2F;read&#x2F;write, but using the strace output of a binary, which is much less challenging.
评论 #25154921 未加载
nosefrogover 4 years ago
Great story! I&#x27;ve had a lot of debugging nightmares, but thankfully never anything as bad as that.<p>One thing that looks fishy is this branch:<p><pre><code> if (container_tasks_len == max_container_tasks) { printk(&quot;cplayground: ERROR: container_tasks list hit capacity! We &quot; &quot;may be missing processes from the procfile output.\n&quot;); break; } </code></pre> Since you said printk can block, why isn&#x27;t calling it in the rcu critical section a bug? Is it because you immediately break afterwards and don&#x27;t try to reference the next task?
评论 #25154887 未加载
secondcomingover 4 years ago
Great article! Reminds me of when I was working on a bug in a phone kernel and adding its equivalent of printk() made the bug disappear! Lauterbach time!
pjmlpover 4 years ago
Back in the Windows NT&#x2F;2000 days, IIS executed as part of the kernel, debugging ISAPI extensions was an exercise in patience every time a programming error crashed the kernel and a reboot was in order.
knownover 4 years ago
Free Book <a href="https:&#x2F;&#x2F;www.tldp.org&#x2F;LDP&#x2F;lkmpg&#x2F;2.6&#x2F;html&#x2F;lkmpg.html" rel="nofollow">https:&#x2F;&#x2F;www.tldp.org&#x2F;LDP&#x2F;lkmpg&#x2F;2.6&#x2F;html&#x2F;lkmpg.html</a>
foxhlchenover 4 years ago
nice article but I think op should use debugfs instead of &#x2F;proc. debugfs is designed for this purpose.
devitover 4 years ago
You can do most or all of that by reading &#x2F;proc&#x2F;&lt;pid&gt;&#x2F;fdinfo&#x2F;&lt;fd&gt; and &#x2F;proc&#x2F;&lt;pid&gt;&#x2F;fd&#x2F;&lt;fd&gt; or by making system calls on the affected fds (which you can do e.g. by injecting code with LD_PRELOAD or ptrace or with nsenter with fd namespace or equivalent C code).<p>Even if you write a kernel driver, iterating over all tasks in the system is a terrible design (there may be millions), not to mention &quot;determining if a task belongs to a C playground program&quot; in the kernel (obviously the kernel should have no knowledge about such specifics).<p>Of course, if a developer cannot even produce a reasonable overall design, it&#x27;s not surprising that they aren&#x27;t capable of writing correct code.
评论 #25153999 未加载
评论 #25154293 未加载