It would be nice if .NET Core profiling was a bit easier on Linux, Microsoft has a shell script[1] to do profiling but it requires Windows only tools.<p>They don't ship Crossgen with the Linux packages, and you have to manually generate the .NET runtime symbols.<p>I've gotten things like FlameGraphs working using BCC profile[2], but it took quite a bit of work.<p>[1]: <a href="https://raw.githubusercontent.com/dotnet/corefx-tools/master/src/performance/perfcollect/perfcollect" rel="nofollow">https://raw.githubusercontent.com/dotnet/corefx-tools/master...</a>
[2]: <a href="https://github.com/iovisor/bcc/blob/master/tools/profile.py" rel="nofollow">https://github.com/iovisor/bcc/blob/master/tools/profile.py</a>
A tip related to the throw inlining tip:
One way to get more consistent/effective inlining is to split the complex 'slow paths' out of your functions into helper functions. For example, let's say you have a cached operation with cache hit and cache miss paths:<p><pre><code> void GetValue (string key, out SomeBigType result) {
if (_cache.TryGetValue(key, out result))
return;
result = new SomeBigType(key, ...);
_cache[key] = result;
}
</code></pre>
In most scenarios this function might not get inlined, because the cache miss path makes the function bigger. If you use the aggressive inlining attribute you might be able to convince the JIT to inline it, but once the function gets bigger it doesn't inline anymore.<p>However, if you pull the cache miss out:<p><pre><code> void GetValue (string key, out SomeBigType result) {
if (_cache.TryGetValue(key, out result))
return;
GetValue_Slow(key, out result);
}
void GetValue_Slow (string key, out SomeBigType result) {
result = new SomeBigType(key, ...);
_cache[key] = result;
}
</code></pre>
You will find that in most cases, GetValue is inlined and only GetValue_Slow produces a function call. This is especially true in release builds and you can observe it in the built-in Visual Studio profiler or by looking at method disassembly.<p>(Keep in mind that many debuggers - including VS's - will disable JIT optimization if you start an application under the debugger or attach to it. You can disable this.)<p>This tip applies to both desktop .NET Framework and .NET Core, in my testing (netcore is generally better at inlining, though!) If you're writing any performance-sensitive paths in a library I highly recommend doing this. It can make the code easier to read in some cases anyway.
One of the tips is to avoid Linq, which many .NET developers are hesitant to do. I made a library that lets you use Linq style convenience functions without a performance hit in many cases:<p><a href="https://github.com/jackmott/LinqFaster" rel="nofollow">https://github.com/jackmott/LinqFaster</a>
> Reduce branching & branch misprediction<p>I wrote a parser for a "formalized" URI (it looked somewhat like OData). This parser was being invoked millions of times and was adding minutes to an operation - it dominated the profile at something like 30% CPU time. It started off something like this:<p><pre><code> int state = State_Start;
for (var i = 0; i < str.Length; i++)
{
var c = str[i];
switch (state)
{
case State_Start:
/* Handle c for this state. */
/* Update state if a new state is reached. */
}
}
</code></pre>
Hardly rocket science, a clear-as-day miniature state machine. VTune was screaming about the switch, so I changed it to this:<p><pre><code> for (var i = 0; i < str.Length; i++)
{
for (; i < str.Length; i++)
{
var c = str[i];
/* Handle c for this state. */
/* Break if a new state is reached. */
}
for (; i < str.Length; i++)
{
var c = str[i];
/* Handle c for this state. */
/* Break if a new state is reached. */
}
}
</code></pre>
The new profile put the function at < 0.1% of CPU time. This is something that the "premature optimization crowd" (who tend to partially quote Knuth concerning optimization) get wrong: death by a thousand cuts. A <i>single</i> branch in the source (it ends up being more in machine code) was costing 30% performance.
> Mark classes as sealed by default<p>Please, no! This shouldn't be the <i>default</i> - it's a constant bugbear of mine where I want to extend a class from a library, and I can't because it's been sealed for no good reason.