Developer of the tool here :) Glad to see it posted here, I still actively use it myself. Also check out the fzf integration in the README: <a href="https://github.com/phiresky/ripgrep-all/blob/master/doc/rga-fzf.gif" rel="nofollow">https://github.com/phiresky/ripgrep-all/blob/master/doc/rga-...</a><p>Currently the main branch is undergoing a refactor to add support for having custom extractors (calling out to other tools), and more flexible chains of extractors.<p>Ripgrep itself has functionality integrated to call custom extractors with the `--pre` flag, but by adding it here we can retain the benefits of the rga wrapper (more accurate file type matchers, caching, recursion into archives, adapter chaining, no slow shell scripts in between, etc).<p>Sadly, during rewriting it to allow this, I kind of got hung up and couldn't manage to figure out how to cleanly design that in Rust. I'd be really glad if a Rust expert could help me out here:<p>In the currently stable version, the main interface of each "adapter" is `fn(Read, Write) -> ()`. To allow custom adapter chaining I have to change it to be `fn(Read) -> Read` where each chained adapter wraps the read stream and converts it while reading. But then I get issues with how to handle threading etc, as well as a random deadlock that I haven't figured out how to solve so far :/
thanks but it's way faster to have my stuff in G drive<p>that way I can open a browser tab, wait 5 seconds for it to load, locate the new screen location of the search bar, click it, wait for javascript to finish loading so I can click the search bar, click it for real this time, mistype because there's some kind of contenteditable event jank, wait 5 seconds for my results to come up, fix the typo, and just have my results waiting for me<p>I'm not going to learn a new tool when web is fine
I love that we’re seeing fast & flexible solutions for personal search.<p>I’ve recently been playing with Recoll for full-text-search on content. Since it indexes content up front, the search is pretty fast. It can also easily accommodate tag metadata on files.<p>It would be interesting to consider how ripgrep based tools can fit into generically broad “search your database of content” workflows (as opposed to remember or go through your file system paths).
One a related note there is one program that I absolutely miss on Linux called everything (on windows).<p>The closest I can find is mlocate but it does not have a GUI but more importantly it does not index my Windows or NTFS drives.<p>Would appreciate any suggestions if someone knows something like 'everything' for Ubuntu.
Big fan of rga! I use it almost every day for the academic part of my life, when I want to know the location of some specific keywords in my lecture slides, books or papers I've been reading. Even for single ebooks, it is often more useful than the search in Acrobat Reader.
No ripgrep-all through the package manager:<p><pre><code> $ sudo dnf install -y ripgrep-all
[...]
No match for argument: ripgrep-all
Error: Unable to find a match: ripgrep-all
</code></pre>
Rust's package manager fails:<p><pre><code> $ cargo install ripgrep_all
[...]
failed to select a version for the requirement `cachedir = "^0.1.1"`
candidate versions found which didn't match: 0.2.0
location searched: crates.io index
required by package `ripgrep_all v0.9.6`
</code></pre>
Quick search on the web shows that more people have problems with cachedir version.
The "Integration with fzf" example looks really cool:<p><a href="https://github.com/phiresky/ripgrep-all#integration-with-fzf" rel="nofollow">https://github.com/phiresky/ripgrep-all#integration-with-fzf</a>
Idea behind Rga is cool.
Anyway, I tried it on Mac and installed via Homebrew. The formula already says it depends on ripgrep (that's fine since I have ripgrep already installed and use it regularly). I still was surprised when I executed Rga for the first time and got an error message that 'pdftotext' was not found. Since pdftotext has been officially discontinued, I am not sure if I want to install an old version just to make Rga work on my machine. Don't think it's an good idea to rely on a project which is not maintained actively.
I always found useful something along the lines of<p><pre><code> pdftotext -layout file.pdf | grep -E ...
</code></pre>
for PDFs, good to see a Swiss Army knife utility for all sorts of file though!
If anyone is interested gron [0], I have an open PR [1] to add it as an adapter to ripgrep-all. The patch was based on the most recent release, since master is currently not functional.<p>0: <a href="https://github.com/TomNomNom/gron" rel="nofollow">https://github.com/TomNomNom/gron</a><p>1: <a href="https://github.com/phiresky/ripgrep-all/pull/77" rel="nofollow">https://github.com/phiresky/ripgrep-all/pull/77</a>
I noticed that you can use Tesseract as an OCR adapter for rga. Tesseract is written in python, IIRC, and in the OP it comes with a warning that it’s slow and not enabled by default. Are there any other fast, reliable OCR libs out there? Or any rust OCR backends?
Can it (or any tool) perform proximity searches on scanned PDFs? E.g word1 within 20 words of word2, on scanned PDFs? (I think this is non trivial but very useful.)
For PDFs, how does it (does it?) deal with for example, when phrases get ripped apart by the layout? Like if you search for a multiple word phrase, it's often foiled by word wrap or being in a table.
can it produce links to open the file yet (don't know rust, so can't add a PR easily). At least gnome-terminal supports that (and normally it should also support opening a specific pdf page)!
If curious see also<p>2019 <a href="https://news.ycombinator.com/item?id=20196982" rel="nofollow">https://news.ycombinator.com/item?id=20196982</a>
This is great. I have 100+ ebooks/pdfs of programming and textbooks of which I've been extracting the index pages of. My intention was to always make some sort of search index out of them. I will definitely be trialing this (initial few searches seem promising!)
Curious why this isn't a pull request to ripgrep? Maybe it was, and rejected? It'd be nice to just have one tool, and this doesn't feel like it's a stretch to add to ripgrep.
Any advantages to this over something like Agent Ransack?<p><a href="https://www.mythicsoft.com/agentransack/" rel="nofollow">https://www.mythicsoft.com/agentransack/</a>
It would be nice to have a direct comparison with ugrep. In the case of rg the benchmarks are already enough to switch. Why should I use rga instead of ugrep?
Aww hell yeah we should definitely use this in place of ripgrep for the new ArchiveBox.io full-text search backend.<p><a href="https://github.com/ArchiveBox/ArchiveBox/pull/543" rel="nofollow">https://github.com/ArchiveBox/ArchiveBox/pull/543</a>
Sounds like a poor man's version of recoll<p><a href="https://www.lesbonscomptes.com/recoll/" rel="nofollow">https://www.lesbonscomptes.com/recoll/</a><p>A PDF in a Zip file, in an email attachment. recoll can index it and do OCR if you like
I have mixed feelings about these kinds of tools.<p>I can understand it might be nice to have a personal library of PDF books and searching in them. I can't think of a time I've ever wished I could search my bookshelf in that way, but you never know.<p>Obviously I use tools like ripgrep for searching codebases and the like.<p>But the extreme flexibility of this one in particular (and others like MacOs Spotlight) makes it seem more like a data recovery tool for me. If my directory structures and databases ever completely failed for some reason I might need to search through everything to find the data again. It's good to know such tools exist, I suppose.<p>But my fear is that tools like this teach people to not worry about organisation of data and to just fill up their disks with no structure at all. I think that unless something goes terribly wrong nobody should ever need a tool like this. Once you rely on it, you're out of luck it if it ever fails you. What if you just can't remember a single searchable phrase from some document, but you just <i>know</i> it must exist somewhere?<p>It's similar to what Google has done to the web. When I was growing up it used to be a skill to use the web. People used tools like bookmarks and followed links from one place to another. Now it's just type it into Google and if Google doesn't know, it doesn't exist.