How to do OCR on a Mac using the CLI or just Python

357 点作者 gregsadetsky超过 1 年前

35 条评论

zavertnik超过 1 年前

Nice post, OP! I was super impressed with the Apple's vision framework. I used it on a personal project involving the OCRing of tens of thousands of spreadsheet screenshots and ingesting them into a postgres database. I tried other OCR CPU methods (since macOS and Nvidia still don't play nice together) such as Tesseract but found the output to be incorrect too often. The vision framework was not only the highest quality output I had seen, but it also used the least amount of compute. It was fairly unstable, but I can chalk that up to user error w/ my implementation.I used a combination of RHetTbull's vision.py (for the actual implementation) [1] + ocrmac (for experimentation) [2] and was pleasantly surprised by the performance on my i7 6700k hackintosh.I wouldn't call myself a programmer but I can generally troubleshoot anything if given enough time, but it did cost time.[1]: <a href="https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac587fc4c" rel="nofollow">https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac5...</a>[2]: <a href="https://github.com/straussmaximilian/ocrmac">https://github.com/straussmaximilian/ocrmac</a>

评论 #38848305 未加载

评论 #38849671 未加载

评论 #38849446 未加载

BoppreH超过 1 年前

I tried doing something similar on Windows, and realized that PowerToys[1], a Microsoft project I already had installed, actually contains a very good OCR tool[2]. Just press Win+Shift+T and select the area to scan, and the text will be copied to the clipboard.[1] <a href="https://learn.microsoft.com/en-us/windows/powertoys/" rel="nofollow">https://learn.microsoft.com/en-us/windows/powertoys/</a>[2] <a href="https://learn.microsoft.com/en-us/windows/powertoys/text-extractor" rel="nofollow">https://learn.microsoft.com/en-us/windows/powertoys/text-ext...</a>

评论 #38847909 未加载

melonamin超过 1 年前

I've built an opensource tool that gives you both CLI and a nice UI. It is free.<a href="https://trex.ameba.co" rel="nofollow">https://trex.ameba.co</a>

评论 #38851757 未加载

hintymad超过 1 年前

I did notice that many Mac apps, including Safari and Preview and Notes, do OCR on images automatically. It's pretty neat that I can easily select text in an image and copy and paste it somewhere else.

评论 #38849039 未加载

评论 #38848881 未加载

评论 #38852182 未加载

tough超过 1 年前

I'm a huge fan of this little ocr tool isntalled through brew onto my macbook <a href="https://github.com/schappim/macOCR">https://github.com/schappim/macOCR</a>

评论 #38847498 未加载

评论 #38852280 未加载

novagameco超过 1 年前

On Windows I recommend text extractor from powertoys:<a href="https://learn.microsoft.com/en-us/windows/powertoys/text-extractor" rel="nofollow">https://learn.microsoft.com/en-us/windows/powertoys/text-ext...</a>

HelloImSteven超过 1 年前

I'll throw my solution into the mix: <a href="https://skaplanofficial.github.io/PyXA/tutorial/images.html#text-extraction" rel="nofollow">https://skaplanofficial.github.io/PyXA/tutorial/images.html#...</a>PyXA uses the Vision framework to extract text from one or more images at a time. It's only a small part of the package, so it might be overkill for a one-off operation, but it's an option.

评论 #38847295 未加载

andreasley超过 1 年前

macOS Ventura and newer actually have basic OCR functionality integrated into the Image Capture UI. When using an AirPrint-compatible scanner and scanning to PDF, the checkbox "OCR" is shown in the right pane.

gist超过 1 年前

To place contents in a file (not claiming this is the most efficient way but it works)OCRTHISFILE="ocr-test.jpg"shortcuts run ocr-text -i "${OCRTHISFILE}"pbpaste > ${OCRTHISFILE}.txtor to view output and place in file:OCRTHISFILE="ocr-test.jpg"shortcuts run ocr-text -i "${OCRTHISFILE}"pbpaste | tee ${OCRTHISFILE}.txt

评论 #38848018 未加载

justinl33超过 1 年前

Awesome! Is there a similar technique for the Apple vision ‘Copy Subject’ feature? I’ve become extremely reliant on it, but it feels very limited in access.

评论 #38847973 未加载

TimeBearingDown超过 1 年前

Very cool, and seems handy!I’ve always had good results from the Preview.app. I wonder how this engine compares for number of errors in a difficult source versus Free alternatives.

评论 #38848616 未加载

est超过 1 年前

Speaking the need of OCRs, I found a comment relevant and quite funny> we already have a common, portable data format for social media. It's screenshots of tweets<a href="https://news.ycombinator.com/item?id=38841569">https://news.ycombinator.com/item?id=38841569</a>

pugio超过 1 年前

I would really love an `ocrmypdf` like tool which uses Apple Vision to create searchable PDFs from scanned images. I've been searching every week or so for some kind of project but so far haven't found anything. Perhaps it's time to make it myself...

评论 #38851092 未加载

dotsam超过 1 年前

I have played around with the OCR on my mac, and have been very impressed. It has been consistently better than tesseract for my purposes.However, when creating a PDF from images using Preview and exporting using ‘Embed Text’ option to OCR, I have noticed the text is worse than if you OCR the exact same images using the shortcut above or using a script. Presumably Preview is using the Vision framework’s less accurate fast path when preparing the PDF. <a href="https://developer.apple.com/documentation/vision/recognizing_text_in_images" rel="nofollow">https://developer.apple.com/documentation/vision/recognizing...</a>

srott超过 1 年前

you can use clipboard with pbpaste/pbcopy commandsocr-text "$1" && pbpaste

评论 #38847232 未加载

eigenvalue超过 1 年前

Weird, I couldn't get it to work on a bunch of different files, even using very simple file names. Kept getting this error:Error: The operation couldn’t be completed. (WFBackgroundShortcutRunnerErrorDomain error 1.)

评论 #38846635 未加载

stephenr超过 1 年前

The article was posted.. yesterday, and the entire reason given for not using the builtin Shortcuts sharing feature is... an article from 2 years ago, about a bug in the shortcuts hosting service, which has obviously been fixed.I get that some people will want to create it from scratch themselves or incorporate the actual meat of it into a larger shortcut... but not sharing one that does what the article says, because of a bug 2 years ago, is a bit of a weird take.

评论 #38851088 未加载

schappim超过 1 年前

If you want to do this a lot easier use: <a href="https://github.com/schappim/macOCR">https://github.com/schappim/macOCR</a>

elpakal超过 1 年前

I don't know why but instead of pasting the text it copied to make sure it worked, I made it read it:shortcuts run ocr-text -i <A PATH TO SOME IMAGE> | say -v Fred

djhn超过 1 年前

Does anyone know of a straightforward library or setup to scan newspapers and/or magazines and detect and extract images and advertisements?

mushufasa超过 1 年前

Very cool. Anyone know how this compares to AWS Textract in general? Does the Apple Vision framework support table recognition?

评论 #38847329 未加载

jmz1超过 1 年前

Raycast (macOS only) is also nice as it's able to search images by text. It also allows you to copy text from those images. Quick official demo here: <a href="https://www.youtube.com/watch?v=c96IXGOo6E4" rel="nofollow">https://www.youtube.com/watch?v=c96IXGOo6E4</a>

ggm超过 1 年前

How to interact with built in OCR via the cli? "Doing" something is (to me) which ocr tooling, what fonts it recognises, all the associated package management and tuning not "how I configure the gui and ui to let me use the tool they shipped with the os"

sigoden超过 1 年前

use LLMs (gpt-4-vision or LLaVA) with aichat`aichat -f tmp/test.png -- output only text in the image`<a href="https://github.com/sigoden/aichat">https://github.com/sigoden/aichat</a>

b__d超过 1 年前

The way I do this: It's built right into the macOS Screenshot app:- Press CMD+SHIFT+4- Draw square on screen where you want to extract the text from- (Quickly) click on the preview image in the lower right corner- Copy text from image

krudnicki超过 1 年前

I made a Shortcut + PHP to get text from a screenshot, ask ChatGPT to make a task name from text, and create new task in Clickup and attache a screenshot. Use it often.

rikafurude21超过 1 年前

Are ios and macos shortcuts crosscompatible? I didnt know there was shortcuts for the mac, seems pretty powerful to be able to run them from the terminal too. Thanks OP

评论 #38847263 未加载

predictsoft超过 1 年前

On Windows, A9T9 does a great job of OCR'ing scanned JPEG files (and any JPEG file). It's also free.I scanned about 100 A4 documents in just a couple of minutes.

minimaxir超过 1 年前

Surprisingly, the Extract Text from Image action is available on Intel Macs: normally, features like automatic-image-OCR is limited to Apple Silicon Macs.

评论 #38850576 未加载

systemtrigger超过 1 年前

This works great for local files. I can't seem to modify the shortcut correctly for an image hosted at a public URL.

loevborg超过 1 年前

CleanShot X (which is great) also allows you to OCR from your screen ("Capture Text")

gvkhna超过 1 年前

Is there any benchmarks on speed/compute/accuracy anywhere comparing to tesseract v5?

cyberax超过 1 年前

It doesn't work for Chinese characters :(

CodeNest超过 1 年前

Python is quite basic and might not be very helpful for advanced users. It seems overly detailed for such a simple task.

geniium超过 1 年前

Have u guy tried ChatGpt or other alternative?

评论 #38848347 未加载