Reinventing escaping over and over again (which bash scripts in particular seem to encourage) is a suckers game. It's difficult to get right and if you're constantly redoing it you're eventually going to make a mistake. I've worked in web security and it's sad to see how likely it is for people with good intentions to mess this up. I'm glad the author basically came to this conclusion.<p>The winning strategy is to use a library/framework/whatever for embedding user-provided content into HTML. Sane HTML template libraries will do this. That library has had more time to get it right. Furthermore a well designed API will clearly indicate what is trusted vs. untrusted data and all untrusted data is properly encoded before being embedded. See the "Security Model" section of golangs HTML templates below.<p>An alternative to using the git tools which is appropriate for serious work (shell pipelines are great for prototyping) is libgit2. It has bindings for many languages. It's very easy to use (sometimes (not always) easier than the CLI) and often much higher performance vs. big shell pipelines (operating on text gets slow pretty fast, and often you end up using xargs...)<p>An example set of tools: <a href="https://golang.org/pkg/html/template/" rel="nofollow">https://golang.org/pkg/html/template/</a> + <a href="https://github.com/libgit2/git2go" rel="nofollow">https://github.com/libgit2/git2go</a> .<p>It's not as succinct as a bash script but it's easier to build something that's correct. Use the shell to prototype, build it right in a saner environment.
<p><pre><code> git log --pretty=format:"%H%x00%s" | sed 's/&/\&amp;/g; s/</\&lt;/g; s/>/\&gt;/g; s/"/\&quot;/g; s/'"'"'/\&#39;/g; s@\(.*\)\x0\(.*\)@<tr><th>\1</th><td>\2</td></tr>@'
</code></pre>
You could do the dumb html entifying in a real language. The article's solution is a straw man, since it's promoting their personal language.<p>Why did they see \x01 & \x02 as possible sentinels but not nulls? python is fine with nulls…
The underlying problem with the first, simple, approach is that the template it is using to get things from git,<p><pre><code> "<tr> <td>%H</td> <td>%s</td> <tr>"
</code></pre>
interpolates values that need to be escaped, but includes literal text that must not be escaped. (My guess is that the author meant "</tr>" for the last element, but the article says "<tr>" so I'm going with that).<p>The author's approach to deal with that is to mark the places in the template where escaping will be needed, and then make and use an escaping tool that recognizes those marks and just escapes the marked segments.<p>A simpler approach is to eliminate the underlying problem. For getting the data out of git use a template where the literal text is safe to escape, such as this:<p><pre><code> "%H,%s"
</code></pre>
The escaping can then be done by a tool that escapes its entire input. That will leave the comma from the template alone, and will not introduce any new commas. The interpolation of %s might have introduced commas, but they will all be after the literal comma from the template. The interpolation of %H will not introduce commas.<p>The output from the escaper can then be transformed into the final output by replacing the first "," with "</td> <td>", prepending "<tr> <td>", and appending "</td> <tr>". All of these are simple in a shell pipeline using sed.
You can skip having to escape any characters or worry if the content is correct, if you put an unformatted git log into a script tag, and then line split and set the content of each element via a JS call.<p>I just tried it, and it works beautifully, no problems with illegal characters.<p>What's wrong with this? It'd be super easy to extend if you want columns or colors or links...<p><pre><code> <script id='gitlog' type='text'>
c0c3150f5 09 - 15 dahart Color widget!, #1 improving < hsv > && things [Finishes #8736345] \m/ '",.;:%$#@*
</script>
<div id='lines'></div>
$('#gitlog').html().split('\n').forEach(line => {
$('#lines').append($('<div class="line"/>').text(line))
})</code></pre>
Also consider <a href="https://www.pixelbeat.org/scripts/ansi2html.sh" rel="nofollow">https://www.pixelbeat.org/scripts/ansi2html.sh</a> for the general case of (colored) output to html conversion
`gitweb` is a server that comes with your git install.<p>The `gitweb` web interface includes both a log and shortlog view for repositories. You can probably use those to some benefit.<p>This seems to be the source of the shortlog command:<p><a href="https://github.com/git/git/blob/master/gitweb/gitweb.perl#L5889" rel="nofollow">https://github.com/git/git/blob/master/gitweb/gitweb.perl#L5...</a>
why do people insist upon reinventing the wheel badly:<p>git log --color=always <whatever funky coloring, options, etc you want> | aha > git_log.html<p>side note: aha is not installed by default on macOS but homebrew will fix that for you. Also, it has many color and styling options.
>Some programmers might stop here and say, Let's switch to a real programming language. Do it the right way.<p>Isn't using Python switching to a real programming language?