Fairly trivial base introduction to the subject.<p>In my experience teaching undergrads they mostly get this stuff already. Their CompArch class has taught them the basics of branch prediction, cache coherence, and instruction caches; the trivial elements of performance.<p>I'm somewhat surprised the piece doesn't deal at all with a classic performance killer, false sharing, although it seems mostly concerned with single-threaded latency. The total lack of "free" optimization tricks like fat LTO, PGO, or even the standardized hinting attributes ([[likely]], [[unlikely]]) for optimizing icache layout was also surprising.<p>Neither this piece, nor my undergraduates, deal with the more nitty-gritty elements of performance. These mostly get into the usage specifics of particular IO APIs, synchronization primitives, IPC mechanisms, and some of the more esoteric compiler builtins.<p>Besides all that, what the nascent low-latency programmer almost always lacks, and the hardest thing to instill in them, is a certain paranoia. A genuine fear, hate, and anger, towards unnecessary allocations, copies, and other performance killers. A creeping feeling that causes them to compulsively run the benchmarks through callgrind looking for calls into the object cache that miss and go to an allocator in the middle of the hot loop.<p>I think a formative moment for me was when I was writing a low-latency server and I realized that constructing a vector I/O operation ended up being overall slower than just copying the small objects I was dealing with into a contiguous buffer and performing a single write. There's no such thing as a free copy, and that includes fat pointers.
My emphasis:<p>> The output of this test is a test statistic (t-statistic) and an associated p-value. The t-statistic, also known as the score, is the result of the unit-root test on the residuals. A more negative t-statistic suggests that the residuals are more likely to be stationary. The p-value provides a measure of the probability that the null hypothesis of the test (no cointegration) is true. <i>The results of your test</i> yielded a p-value of approximately 0.0149 and a t-statistic of -3.7684.<p>I think they used an LLM to write this bit.<p>It's also a really weird example. They look at correlation of once-a-day close prices over five years, and then write code to calculate the spread with 65 microsecond latency. That doesn't actually make any sense as something to do. And you wouldn't be calculating statistics on the spread in your inner loop. And 65 microseconds is far too slow for an inner loop. I suppose the point is just to exercise some optimisation techniques - but this is a rather unrepresentative thing to optimise!
I've got an implementation of a stock exchange that uses the LMAX disruptor pattern in C++
<a href="https://github.com/sneilan/stock-exchange">https://github.com/sneilan/stock-exchange</a><p>And a basic implementation of the LMAX disruptor as a couple C++ files
<a href="https://github.com/sneilan/lmax-disruptor-tutorial">https://github.com/sneilan/lmax-disruptor-tutorial</a><p>I've been looking to rebuild this in rust however. I reached the point where I implemented my own websocket protocol, authentication system, SSL etc. Then I realized that memory management and dependencies are a lot easier in rust. Especially for a one man software project.
Reminds me of <a href="https://github.com/CppCon/CppCon2017/blob/master/Presentations/When%20a%20Microsecond%20Is%20an%20Eternity/When%20a%20Microsecond%20Is%20an%20Eternity%20-%20Carl%20Cook%20-%20CppCon%202017.pdf">https://github.com/CppCon/CppCon2017/blob/master/Presentatio...</a>
I made a C++ logging library [1] that has many similarities to the LMAX disruptor. It appears to have found some use among the HFT community.<p>The original intent was to enable highly detailed logging without performance degradation for "post-mortem" debugging in production environments. I had coworkers who would refuse to include logging of certain important information for troubleshooting, because they were scared that it would impact performance. This put an end to that argument.<p>[1] <a href="https://github.com/mattiasflodin/reckless">https://github.com/mattiasflodin/reckless</a>
<i>> The noted efficiency in compile-time dispatch is due to decisions about function calls being made during
the compilation phase. By bypassing the decision-making overhead present in runtime dispatch, programs
can execute more swiftly, thus boosting performance.</i><p>The other benefit with compile-time dispatch is that when the compiler can statically determine which function is being called, it may be able to inline the called function's code directly at the callsite. That eliminates <i>all</i> of the function call overhead and may also enable further optimizations (dead code elimination, constant propagation, etc.).
Is there any good reason for high-frequency trading to exist? People often complain about bitcoin wasting energy, but oddly this gets a free pass despite this being a definite net negative to society as far as I can tell.
Just in case you are a pro developer, the whole thing is worth looking at:<p><a href="https://github.com/CppCon/CppCon2017/tree/master/Presentations">https://github.com/CppCon/CppCon2017/tree/master/Presentatio...</a><p>and up
I am curious: why does this field use/used C++ instead of C for the logic? What benefits does C++ have over C in the domain? I am proficient in C/assembly but completely ignorant of the practices in HFT so please go easy on the explanations!