I am the one who originally added this sleep call to WebKit, when we first imported TCMalloc to use as our custom allocator. It was indeed there for a reason, but that reason is not applicable to WebKit's allocation patterns. TCMalloc was designed for a server workload, over time we have adapted it more to the unique needs of a browser engine. This change may help other operations, but probably not as much as the GC benchmark in question.
True story...<p>Client gives me a Help Desk Ticket. User claims batch job use to run in 5 minutes but now runs for hours. The logs confirm this. The commit logs show that an offshore programmer forgot to remove a 10 second sleep command from inside an iteration (for debugging I presume) before promoting to production. I removed it and got a 100X improvement in throughput.<p>My client said that now his user loves him; what did I do to fix it so fast?<p>When I told him that I removed the Sleep, he said, "No! No! No! Change it to a 5 second Sleep so I have something to give him the next time he complains!"
It's interesting how spinlocks are making a comeback in user space code. They are widely used in kernel code, but their use has previously been discouraged in application code. As this example illustrates, it's pretty easy to get them wrong...
It was "only" on one particular benchmark, still it shows that profiling often give you surprising results that nobody familiar with the code anticipated.
Why did they roll their own spinlock implementation in the first place? They could encapsulate the lock primitives provided by, and optimized for, each platform: CRITICAL_SECTIONs on Windows, futexes on Linux, and pthread_spin_locks elsewhere. iOS and OSX have some Mach-specific spinlocks (OSSpinLockLock?), but I don't have any experience using them.
So if I understand this correctly the 3.7x speedup is a speedup in garbage collection and will have a positive effect on the speed of the browser which is nice.<p>But what sort of effect will this have on page rendering? Anyone in the know care to comment on the specifics?<p>edit: improvements to my shoddy wording.
To be fair, it's a sleep(0), which I assume is just the easiest way to get the process to context switch. I assume this is a speed improvement only when locks become ready sooner than a context switch.<p>The real fix would be to make context switches faster, this is just a data-driven tweak rather than a general solution.
Reminds me of "the speedup loop": <a href="http://thedailywtf.com/Articles/The-Speedup-Loop.aspx" rel="nofollow">http://thedailywtf.com/Articles/The-Speedup-Loop.aspx</a>