Can someone explain how such a HUGE codebase is maintained and new people get acquainted with it? I understand the distributed nature of development, but still, this seems like LOTS of code especially for someone new and who wants to get started in terms of contributing.
So freaking what?<p>EDIT<p>It's not a valid metric. 15 million can be a lot or a little. It's all relative. This is 15 million LoC of plain C code. This is code that includes thousands of device-specific routines, but even if it didn't, what are you comparing it to?<p>Let's say it is too long even compared to other C kernels w/ device drivers. What price are you putting on well-written code? Do you know how many of those lines are hints to the compiler? Those don't even get compiled.
Has anyone out there done refactoring experiments? Perhaps as an academic exercise? I don't know the kernel well enough to say for sure but I do recall a fair bit of duplicated code in e.g. the drivers. It would be interesting to see what could be done there.
15 MLOC?<p>That seems like allot , this must include allot of things that can optionally be compiled into the kernel?<p>Last time I browsed the kernel source tree it looked like a few 100KLOC to me.
Like the article says, something like 75% of all of that is driver specific or related to filesystems. You can still compile a minute little kernel for embedding that will be a fraction of the size of the full kernel. And there's still quite a lot of old cruft in there that <i>could</i> be left out.<p>Even so, if true, it does seem to be getting quite bloaty. Still, it has to cover a lot of ground nowadays in terms of hardware.