If you're curious how Git knows the syntax of different languages in order to support this kind of feature, take a look in <a href="https://github.com/git/git/blob/master/userdiff.c">https://github.com/git/git/blob/master/userdiff.c</a><p>Here's how support for Python and Ruby are defined:<p><pre><code> PATTERNS("python",
"^[ \t]*((class|(async[ \t]+)?def)[ \t].*)$",
/* -- */
"[a-zA-Z_][a-zA-Z0-9_]*"
"|[-+0-9.e]+[jJlL]?|0[xX]?[0-9a-fA-F]+[lL]?"
"|[-+*/<>%&^|=!]=|//=?|<<=?|>>=?|\\*\\*=?"),
/* -- */
PATTERNS("ruby",
"^[ \t]*((class|module|def)[ \t].*)$",
/* -- */
"(@|@@|\\$)?[a-zA-Z_][a-zA-Z0-9_]*"
"|[-+0-9.e]+|0[xXbB]?[0-9a-fA-F]+|\\?(\\\\C-)?(\\\\M-)?."
"|//=?|[-+*/<>%&^|=!]=|<<=?|>>=?|===|\\.{1,3}|::|[!=]~"),</code></pre>
Oh wow, this is very cool!<p>The way this works boils down to the following: by default, Git has a heuristic for determining the "context" of a diff hunk by looking for lines that start with certain non-whitespace characters. This context is printed out after the "@@" marker in the hunk header. Within git, this context is referred to as the "function name", but that's a bit inaccurate as the patterns will usually match other scopes like namespaces and classes.<p>Setting "diff=LANG" activates a different (regular expression) pattern which is used to identify context; for example, in Python, this will look for "class" and "def" keywords. Git ships with a bunch of built-in patterns (defined in <a href="https://github.com/git/git/blob/master/userdiff.c">https://github.com/git/git/blob/master/userdiff.c</a>), and the "diff.LANG.xfuncname" config option can be used to specify a custom pattern.<p>-L can then be used to look for hunks which have context matching a certain pattern. For example, if you want to look for function "foo", you could use -L ':\bfoo\b:file.py' (note that if you don't use \b you'll get every function that <i>contains</i> the word foo). Also related is the -W flag, which will show the entire function/class/scope in the diff, again based on context.<p>Note some limitations: the matching is line-by-line, so it will pick up "context" from things like string literals and comments, and you will only get the first line of the context (so multi-line signatures will be truncated). Also, since -L takes a regular expression to match against the context line, you'll want to take care to use an appropriate pattern to avoid matching unwanted functions (e.g. use \b to avoid substring matches, or even "def foo(" to ensure you only match to methods and not to classes or parameter names).<p>See also <a href="https://stackoverflow.com/questions/28111035/where-does-the-excerpt-in-the-git-diff-hunk-header-come-from" rel="nofollow noreferrer">https://stackoverflow.com/questions/28111035/where-does-the-...</a> for a very comprehensive overview of this feature.
I love using the `-G` flag for tracking the history of any occurrence of a given regex across all directories/files. It feels more flexible than `-L`. As an example:<p><pre><code> git log \
-G "$some_regex" \
--patch \
--stat \
--source \
--all \
--decorate=full \
--pretty=fuller \
-- . ":(exclude)\*.lock"</code></pre>
Huh TIL Git can do this for non C-like languages out of the box. Is there a reason these custom hunk handlers are not defined by default for the languages Git ships support with?