For having spent some time playing with ripgrep and hyperscan (<a href="https://sr.ht/~pierrenn/ripgrep/" rel="nofollow">https://sr.ht/~pierrenn/ripgrep/</a>) the benchmarking part looks really odd to me.<p>The T5/T6/T7/T8/T9 are no way extensive enough to show a difference between hyperscan and ripgrep. Plus, the benchmark already includes the pattern compilation time in the hyperscan benchmarks so this makes even less sense.<p>So the only upside left to this tool is usability. And from a quick test I don't really see the point. I'd prefer to use something written in a safe language (e.g. Rust) than this so I guess I'll just stick with rg.
I use this an ripgrep. By default I tend to use ripgrep. It skips various files, e.g. tarballs, ".git", ".svn" and so on. It's quite quick as a result. I also like the output a lot, though it's kind of annoying that the output format changes when output to the screen vs when redirected (e.g. to a pipe). It would be nicer if ripgrep always redirected things through less with colours (the way that git does it).<p>Ugrep is great because it can grep through tarballs. But the default output format of ripgrep is nicer, plus ripgrep seems quicker when searching recursively with loads of files that could be skipped (so tarballs, etc). I'm guessing ugrep skips less files.
Disclosure: I am the author of ugrep.<p>Here are my 2 cents.<p>So what is going on with the new ugrep tool? <a href="https://github.com/Genivia/ugrep" rel="nofollow">https://github.com/Genivia/ugrep</a><p>As a small organization specializing in open source software we needed a search tool like grep but updated to handle many compression formats including tarballs, with filters to search PDF, DOCX, and other formats. And narrowing down the file type of the archive contents to source code when necessary. Why? For example to look for differences that explain bugs, for finding potential vulnerabilities in older software that is archived, and to check for open source licenses/violations.<p>OK. But what about performance?<p>At the same time, I worked on designing a new fast pattern matching method that in simple wording uses logic/hashing to detect possible matches fast, before performing a regex match that is more CPU expensive. This method is extensively tested with many configurations of parameters to find the optimal parameterization as tested on several machines. This was then compared to the best-known algorithms I could find and implemented in C (tested in memory, not on files and not reported in the ugrep project). I will gladly share this method publicly eventually in a technical paper. For a while I contemplated filing a utility patent, but did not move forward on that because I want this technology to be freely available to everyone and not proprietary. Of course, it would be nice to receive some recognition and not get ripped off. Most of the grep tools just use what is already available publicly and aren't doing something new that is clever, with the possible exception of hyperscan.<p>Secondly, I am glad to see that ugrep is useful to many others. In my conversations with ugrep users, performance is not their top concern but having these new features that ugrep offers that other grep lack. As long as ugrep is very fast, they are more than happy. They also suggested that ugrep should be compatible with GNU grep's options and not try to be "too clever" to skip files and directories, at least not out-of-the-box. None of the ugrep perf tests skip files or directories and includes all hidden files/directories, binary files, and compressed files.<p>Combining all these requirements and suggestions by users into ugrep wasn't trivial. But I believe we accomplished that goal reasonably well. Having said that, ugrep is relatively new and still evolving.<p>There are a lot of opinionated folks when it comes to performance. Many in my domain of expertise realized already over a decade ago that it is a folly to pursue "the best performance" when the variety of architectures is vast and hardware and software are still evolving, even when slowly. There is no set of perfect benchmarks. There are always assumptions and requirements that affect the results wildly (I am a professor in CS and spent my entire career as a researcher, including in the area of high-performance computing.)<p>I enjoy working now and then on deep and challenging coding projects, such as ugrep. It's a crazy fun project to work on when I have time.<p>Cheers!