How to do OCR on a Mac using the CLI or just Python

357 pointsby gregsadetskyover 1 year ago

35 comments

zavertnikover 1 year ago

Nice post, OP! I was super impressed with the Apple's vision framework. I used it on a personal project involving the OCRing of tens of thousands of spreadsheet screenshots and ingesting them into a postgres database. I tried other OCR CPU methods (since macOS and Nvidia still don't play nice together) such as Tesseract but found the output to be incorrect too often. The vision framework was not only the highest quality output I had seen, but it also used the least amount of compute. It was fairly unstable, but I can chalk that up to user error w/ my implementation.I used a combination of RHetTbull's vision.py (for the actual implementation) [1] + ocrmac (for experimentation) [2] and was pleasantly surprised by the performance on my i7 6700k hackintosh.I wouldn't call myself a programmer but I can generally troubleshoot anything if given enough time, but it did cost time.[1]: <a href="https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac587fc4c" rel="nofollow">https://gist.github.com/RhetTbull/1c34fc07c95733642cffcd1ac5...</a>[2]: <a href="https://github.com/straussmaximilian/ocrmac">https://github.com/straussmaximilian/ocrmac</a>

评论 #38848305 未加载

评论 #38849671 未加载

评论 #38849446 未加载

BoppreHover 1 year ago

I tried doing something similar on Windows, and realized that PowerToys[1], a Microsoft project I already had installed, actually contains a very good OCR tool[2]. Just press Win+Shift+T and select the area to scan, and the text will be copied to the clipboard.[1] <a href="https://learn.microsoft.com/en-us/windows/powertoys/" rel="nofollow">https://learn.microsoft.com/en-us/windows/powertoys/</a>[2] <a href="https://learn.microsoft.com/en-us/windows/powertoys/text-extractor" rel="nofollow">https://learn.microsoft.com/en-us/windows/powertoys/text-ext...</a>

评论 #38847909 未加载

melonaminover 1 year ago

I've built an opensource tool that gives you both CLI and a nice UI. It is free.<a href="https://trex.ameba.co" rel="nofollow">https://trex.ameba.co</a>

评论 #38851757 未加载

hintymadover 1 year ago

I did notice that many Mac apps, including Safari and Preview and Notes, do OCR on images automatically. It's pretty neat that I can easily select text in an image and copy and paste it somewhere else.

评论 #38849039 未加载

评论 #38848881 未加载

评论 #38852182 未加载

toughover 1 year ago

I'm a huge fan of this little ocr tool isntalled through brew onto my macbook <a href="https://github.com/schappim/macOCR">https://github.com/schappim/macOCR</a>

评论 #38847498 未加载

评论 #38852280 未加载

novagamecoover 1 year ago

On Windows I recommend text extractor from powertoys:<a href="https://learn.microsoft.com/en-us/windows/powertoys/text-extractor" rel="nofollow">https://learn.microsoft.com/en-us/windows/powertoys/text-ext...</a>

HelloImStevenover 1 year ago

I'll throw my solution into the mix: <a href="https://skaplanofficial.github.io/PyXA/tutorial/images.html#text-extraction" rel="nofollow">https://skaplanofficial.github.io/PyXA/tutorial/images.html#...</a>PyXA uses the Vision framework to extract text from one or more images at a time. It's only a small part of the package, so it might be overkill for a one-off operation, but it's an option.

评论 #38847295 未加载

andreasleyover 1 year ago

macOS Ventura and newer actually have basic OCR functionality integrated into the Image Capture UI. When using an AirPrint-compatible scanner and scanning to PDF, the checkbox "OCR" is shown in the right pane.

gistover 1 year ago

To place contents in a file (not claiming this is the most efficient way but it works)OCRTHISFILE="ocr-test.jpg"shortcuts run ocr-text -i "${OCRTHISFILE}"pbpaste > ${OCRTHISFILE}.txtor to view output and place in file:OCRTHISFILE="ocr-test.jpg"shortcuts run ocr-text -i "${OCRTHISFILE}"pbpaste | tee ${OCRTHISFILE}.txt

评论 #38848018 未加载

justinl33over 1 year ago

Awesome! Is there a similar technique for the Apple vision ‘Copy Subject’ feature? I’ve become extremely reliant on it, but it feels very limited in access.

评论 #38847973 未加载

TimeBearingDownover 1 year ago

Very cool, and seems handy!I’ve always had good results from the Preview.app. I wonder how this engine compares for number of errors in a difficult source versus Free alternatives.

评论 #38848616 未加载

estover 1 year ago

Speaking the need of OCRs, I found a comment relevant and quite funny> we already have a common, portable data format for social media. It's screenshots of tweets<a href="https://news.ycombinator.com/item?id=38841569">https://news.ycombinator.com/item?id=38841569</a>

pugioover 1 year ago

I would really love an `ocrmypdf` like tool which uses Apple Vision to create searchable PDFs from scanned images. I've been searching every week or so for some kind of project but so far haven't found anything. Perhaps it's time to make it myself...

评论 #38851092 未加载

dotsamover 1 year ago

I have played around with the OCR on my mac, and have been very impressed. It has been consistently better than tesseract for my purposes.However, when creating a PDF from images using Preview and exporting using ‘Embed Text’ option to OCR, I have noticed the text is worse than if you OCR the exact same images using the shortcut above or using a script. Presumably Preview is using the Vision framework’s less accurate fast path when preparing the PDF. <a href="https://developer.apple.com/documentation/vision/recognizing_text_in_images" rel="nofollow">https://developer.apple.com/documentation/vision/recognizing...</a>

srottover 1 year ago

you can use clipboard with pbpaste/pbcopy commandsocr-text "$1" && pbpaste

评论 #38847232 未加载

eigenvalueover 1 year ago

Weird, I couldn't get it to work on a bunch of different files, even using very simple file names. Kept getting this error:Error: The operation couldn’t be completed. (WFBackgroundShortcutRunnerErrorDomain error 1.)

评论 #38846635 未加载

stephenrover 1 year ago

The article was posted.. yesterday, and the entire reason given for not using the builtin Shortcuts sharing feature is... an article from 2 years ago, about a bug in the shortcuts hosting service, which has obviously been fixed.I get that some people will want to create it from scratch themselves or incorporate the actual meat of it into a larger shortcut... but not sharing one that does what the article says, because of a bug 2 years ago, is a bit of a weird take.

评论 #38851088 未加载

schappimover 1 year ago

If you want to do this a lot easier use: <a href="https://github.com/schappim/macOCR">https://github.com/schappim/macOCR</a>

elpakalover 1 year ago

I don't know why but instead of pasting the text it copied to make sure it worked, I made it read it:shortcuts run ocr-text -i <A PATH TO SOME IMAGE> | say -v Fred

djhnover 1 year ago

Does anyone know of a straightforward library or setup to scan newspapers and/or magazines and detect and extract images and advertisements?

mushufasaover 1 year ago

Very cool. Anyone know how this compares to AWS Textract in general? Does the Apple Vision framework support table recognition?

评论 #38847329 未加载

jmz1over 1 year ago

Raycast (macOS only) is also nice as it's able to search images by text. It also allows you to copy text from those images. Quick official demo here: <a href="https://www.youtube.com/watch?v=c96IXGOo6E4" rel="nofollow">https://www.youtube.com/watch?v=c96IXGOo6E4</a>

ggmover 1 year ago

How to interact with built in OCR via the cli? "Doing" something is (to me) which ocr tooling, what fonts it recognises, all the associated package management and tuning not "how I configure the gui and ui to let me use the tool they shipped with the os"

sigodenover 1 year ago

use LLMs (gpt-4-vision or LLaVA) with aichat`aichat -f tmp/test.png -- output only text in the image`<a href="https://github.com/sigoden/aichat">https://github.com/sigoden/aichat</a>

b__dover 1 year ago

The way I do this: It's built right into the macOS Screenshot app:- Press CMD+SHIFT+4- Draw square on screen where you want to extract the text from- (Quickly) click on the preview image in the lower right corner- Copy text from image

krudnickiover 1 year ago

I made a Shortcut + PHP to get text from a screenshot, ask ChatGPT to make a task name from text, and create new task in Clickup and attache a screenshot. Use it often.

rikafurude21over 1 year ago

Are ios and macos shortcuts crosscompatible? I didnt know there was shortcuts for the mac, seems pretty powerful to be able to run them from the terminal too. Thanks OP

评论 #38847263 未加载

predictsoftover 1 year ago

On Windows, A9T9 does a great job of OCR'ing scanned JPEG files (and any JPEG file). It's also free.I scanned about 100 A4 documents in just a couple of minutes.

minimaxirover 1 year ago

Surprisingly, the Extract Text from Image action is available on Intel Macs: normally, features like automatic-image-OCR is limited to Apple Silicon Macs.

评论 #38850576 未加载

systemtriggerover 1 year ago

This works great for local files. I can't seem to modify the shortcut correctly for an image hosted at a public URL.

loevborgover 1 year ago

CleanShot X (which is great) also allows you to OCR from your screen ("Capture Text")

gvkhnaover 1 year ago

Is there any benchmarks on speed/compute/accuracy anywhere comparing to tesseract v5?

cyberaxover 1 year ago

It doesn't work for Chinese characters :(

CodeNestover 1 year ago

Python is quite basic and might not be very helpful for advanced users. It seems overly detailed for such a simple task.

geniiumover 1 year ago

Have u guy tried ChatGpt or other alternative?

评论 #38848347 未加载