I am skeptical of those benchmarks. This is written in Python, and, looking at the core loop, yes, it really is Python, not Python wrapped around C or some other acceleration technology. For pure Python to come out appearing to get four times the through put of a C program is pretty dubious. That would have to be one crappy C program. GoAccess looks like it ought to be far enough along that somebody has at least taken a bit of a crack at optimization, but, perhaps not. C ought to be able to smoke pure Python at this task. (Possibly, you know, <i>unsafely</i>, where a crafted referrer may get to arbitrary code execution or something, but still it ought to be <i>way faster</i>.)
The license is currently just<p><pre><code> All rights reserved.
Copyright (c) 2020 Lucian Marin
</code></pre>
It was the MIT license at the time of initial commit, and been updated to this. So it's not immediately clear if anyone else can necessarily use Logparser - care to clarify, Lucian?
Suggestion: take the time to package this up for PyPI as something people can install using "pip install" (or "pipx install").<p>This is hard the first time you do it, but worth learning because it's a really great way to distribute your Python software.<p>I'm giving a talk about how to do this at PyGotham next month, but the notes from that talk are already available and may be useful to you: <a href="https://github.com/simonw/pygotham-packaging" rel="nofollow">https://github.com/simonw/pygotham-packaging</a><p>You may also find this cookiecutter template that I use to build and package Python CLI apps helpful: <a href="https://github.com/simonw/click-app" rel="nofollow">https://github.com/simonw/click-app</a>
On a tangent, I've been looking into log parsing for an application I'm building recently.<p>If you want to support pulling info out of common logs it's pretty simple to pull together a list of regexes for the default log format in each major system. Simple example here: <a href="https://github.com/multiprocessio/datastation/blob/master/shared/text.ts#L13" rel="nofollow">https://github.com/multiprocessio/datastation/blob/master/sh...</a>.<p>I use this in the app to be able to quickly pull info out of access logs for further analysis a la OP's app and GoAccess but in a GUI where you can also do further processing.<p>Demo video of this here: <a href="https://www.youtube.com/watch?v=sCx2mF2jyUQ&t=9s" rel="nofollow">https://www.youtube.com/watch?v=sCx2mF2jyUQ&t=9s</a>.
Are you certain your benchmarks are correct? The GoAccess FAQ states that it parses over 100,000 lines/second [1]. While this figure depends on the hardware used, this still is <i>massively</i> faster than the figure quoted in the README. Benchmarking is quite technical if you want consistent results, so some more information on the benchmarking methodology used here would be much appreciated.<p>[1] <a href="https://goaccess.io/faq#performance" rel="nofollow">https://goaccess.io/faq#performance</a>
Im not sure its an alternative yet, functionally it seems that it misses incremental parsing, live updates, interactive html and tui interfaces, graphs,...