TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to do OCR on a Mac using the CLI or just Python

357 点作者 gregsadetsky超过 1 年前

35 条评论

zavertnik超过 1 年前
Nice post, OP! I was super impressed with the Apple&#x27;s vision framework. I used it on a personal project involving the OCRing of tens of thousands of spreadsheet screenshots and ingesting them into a postgres database. I tried other OCR CPU methods (since macOS and Nvidia still don&#x27;t play nice together) such as Tesseract but found the output to be incorrect too often. The vision framework was not only the highest quality output I had seen, but it also used the least amount of compute. It was fairly unstable, but I can chalk that up to user error w&#x2F; my implementation.<p>I used a combination of RHetTbull&#x27;s vision.py (for the actual implementation) [1] + ocrmac (for experimentation) [2] and was pleasantly surprised by the performance on my i7 6700k hackintosh.<p>I wouldn&#x27;t call myself a programmer but I can generally troubleshoot anything if given enough time, but it did cost time.<p>[1]: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;RhetTbull&#x2F;1c34fc07c95733642cffcd1ac587fc4c" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;RhetTbull&#x2F;1c34fc07c95733642cffcd1ac5...</a><p>[2]: <a href="https:&#x2F;&#x2F;github.com&#x2F;straussmaximilian&#x2F;ocrmac">https:&#x2F;&#x2F;github.com&#x2F;straussmaximilian&#x2F;ocrmac</a>
评论 #38848305 未加载
评论 #38849671 未加载
评论 #38849446 未加载
BoppreH超过 1 年前
I tried doing something similar on Windows, and realized that PowerToys[1], a Microsoft project I already had installed, actually contains a very good OCR tool[2]. Just press Win+Shift+T and select the area to scan, and the text will be copied to the clipboard.<p>[1] <a href="https:&#x2F;&#x2F;learn.microsoft.com&#x2F;en-us&#x2F;windows&#x2F;powertoys&#x2F;" rel="nofollow">https:&#x2F;&#x2F;learn.microsoft.com&#x2F;en-us&#x2F;windows&#x2F;powertoys&#x2F;</a><p>[2] <a href="https:&#x2F;&#x2F;learn.microsoft.com&#x2F;en-us&#x2F;windows&#x2F;powertoys&#x2F;text-extractor" rel="nofollow">https:&#x2F;&#x2F;learn.microsoft.com&#x2F;en-us&#x2F;windows&#x2F;powertoys&#x2F;text-ext...</a>
评论 #38847909 未加载
melonamin超过 1 年前
I&#x27;ve built an opensource tool that gives you both CLI and a nice UI. It is free.<p><a href="https:&#x2F;&#x2F;trex.ameba.co" rel="nofollow">https:&#x2F;&#x2F;trex.ameba.co</a>
评论 #38851757 未加载
hintymad超过 1 年前
I did notice that many Mac apps, including Safari and Preview and Notes, do OCR on images automatically. It&#x27;s pretty neat that I can easily select text in an image and copy and paste it somewhere else.
评论 #38849039 未加载
评论 #38848881 未加载
评论 #38852182 未加载
tough超过 1 年前
I&#x27;m a huge fan of this little ocr tool isntalled through brew onto my macbook <a href="https:&#x2F;&#x2F;github.com&#x2F;schappim&#x2F;macOCR">https:&#x2F;&#x2F;github.com&#x2F;schappim&#x2F;macOCR</a>
评论 #38847498 未加载
评论 #38852280 未加载
novagameco超过 1 年前
On Windows I recommend text extractor from powertoys:<p><a href="https:&#x2F;&#x2F;learn.microsoft.com&#x2F;en-us&#x2F;windows&#x2F;powertoys&#x2F;text-extractor" rel="nofollow">https:&#x2F;&#x2F;learn.microsoft.com&#x2F;en-us&#x2F;windows&#x2F;powertoys&#x2F;text-ext...</a>
HelloImSteven超过 1 年前
I&#x27;ll throw my solution into the mix: <a href="https:&#x2F;&#x2F;skaplanofficial.github.io&#x2F;PyXA&#x2F;tutorial&#x2F;images.html#text-extraction" rel="nofollow">https:&#x2F;&#x2F;skaplanofficial.github.io&#x2F;PyXA&#x2F;tutorial&#x2F;images.html#...</a><p>PyXA uses the Vision framework to extract text from one or more images at a time. It&#x27;s only a small part of the package, so it might be overkill for a one-off operation, but it&#x27;s an option.
评论 #38847295 未加载
andreasley超过 1 年前
macOS Ventura and newer actually have basic OCR functionality integrated into the Image Capture UI. When using an AirPrint-compatible scanner and scanning to PDF, the checkbox &quot;OCR&quot; is shown in the right pane.
gist超过 1 年前
To place contents in a file (not claiming this is the most efficient way but it works)<p>OCRTHISFILE=&quot;ocr-test.jpg&quot;<p>shortcuts run ocr-text -i &quot;${OCRTHISFILE}&quot;<p>pbpaste &gt; ${OCRTHISFILE}.txt<p>or to view output and place in file:<p>OCRTHISFILE=&quot;ocr-test.jpg&quot;<p>shortcuts run ocr-text -i &quot;${OCRTHISFILE}&quot;<p>pbpaste | tee ${OCRTHISFILE}.txt
评论 #38848018 未加载
justinl33超过 1 年前
Awesome! Is there a similar technique for the Apple vision ‘<i>Copy Subject</i>’ feature? I’ve become extremely reliant on it, but it feels very limited in access.
评论 #38847973 未加载
TimeBearingDown超过 1 年前
Very cool, and seems handy!<p>I’ve always had good results from the Preview.app. I wonder how this engine compares for number of errors in a difficult source versus Free alternatives.
评论 #38848616 未加载
est超过 1 年前
Speaking the need of OCRs, I found a comment relevant and quite funny<p>&gt; we already have a common, portable data format for social media. It&#x27;s screenshots of tweets<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38841569">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=38841569</a>
pugio超过 1 年前
I would really love an `ocrmypdf` like tool which uses Apple Vision to create searchable PDFs from scanned images. I&#x27;ve been searching every week or so for some kind of project but so far haven&#x27;t found anything. Perhaps it&#x27;s time to make it myself...
评论 #38851092 未加载
dotsam超过 1 年前
I have played around with the OCR on my mac, and have been very impressed. It has been consistently better than tesseract for my purposes.<p>However, when creating a PDF from images using Preview and exporting using ‘Embed Text’ option to OCR, I have noticed the text is worse than if you OCR the exact same images using the shortcut above or using a script. Presumably Preview is using the Vision framework’s less accurate fast path when preparing the PDF. <a href="https:&#x2F;&#x2F;developer.apple.com&#x2F;documentation&#x2F;vision&#x2F;recognizing_text_in_images" rel="nofollow">https:&#x2F;&#x2F;developer.apple.com&#x2F;documentation&#x2F;vision&#x2F;recognizing...</a>
srott超过 1 年前
you can use clipboard with pbpaste&#x2F;pbcopy commands<p>ocr-text &quot;$1&quot; &amp;&amp; pbpaste
评论 #38847232 未加载
eigenvalue超过 1 年前
Weird, I couldn&#x27;t get it to work on a bunch of different files, even using very simple file names. Kept getting this error:<p>Error: The operation couldn’t be completed. (WFBackgroundShortcutRunnerErrorDomain error 1.)
评论 #38846635 未加载
stephenr超过 1 年前
The article was posted.. yesterday, and the <i>entire</i> reason given for not using the builtin Shortcuts sharing feature is... an article from 2 years ago, about a bug in the shortcuts hosting service, which has obviously been fixed.<p>I get that some people will want to create it from scratch themselves or incorporate the actual meat of it into a larger shortcut... but not sharing one that does what the article says, because of a bug 2 years ago, is a bit of a weird take.
评论 #38851088 未加载
schappim超过 1 年前
If you want to do this a lot easier use: <a href="https:&#x2F;&#x2F;github.com&#x2F;schappim&#x2F;macOCR">https:&#x2F;&#x2F;github.com&#x2F;schappim&#x2F;macOCR</a>
elpakal超过 1 年前
I don&#x27;t know why but instead of pasting the text it copied to make sure it worked, I made it read it:<p>shortcuts run ocr-text -i &lt;A PATH TO SOME IMAGE&gt; | say -v Fred
djhn超过 1 年前
Does anyone know of a straightforward library or setup to scan newspapers and&#x2F;or magazines and detect and extract images and advertisements?
mushufasa超过 1 年前
Very cool. Anyone know how this compares to AWS Textract in general? Does the Apple Vision framework support table recognition?
评论 #38847329 未加载
jmz1超过 1 年前
Raycast (macOS only) is also nice as it&#x27;s able to search images by text. It also allows you to copy text from those images. Quick official demo here: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=c96IXGOo6E4" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=c96IXGOo6E4</a>
ggm超过 1 年前
How to interact with built in OCR via the cli? &quot;Doing&quot; something is (to me) which ocr tooling, what fonts it recognises, all the associated package management and tuning not &quot;how I configure the gui and ui to let me use the tool they shipped with the os&quot;
sigoden超过 1 年前
use LLMs (gpt-4-vision or LLaVA) with aichat<p>`aichat -f tmp&#x2F;test.png -- output only text in the image`<p><a href="https:&#x2F;&#x2F;github.com&#x2F;sigoden&#x2F;aichat">https:&#x2F;&#x2F;github.com&#x2F;sigoden&#x2F;aichat</a>
b__d超过 1 年前
The way I do this: It&#x27;s built right into the macOS Screenshot app:<p>- Press CMD+SHIFT+4<p>- Draw square on screen where you want to extract the text from<p>- (Quickly) click on the preview image in the lower right corner<p>- Copy text from image
krudnicki超过 1 年前
I made a Shortcut + PHP to get text from a screenshot, ask ChatGPT to make a task name from text, and create new task in Clickup and attache a screenshot. Use it often.
rikafurude21超过 1 年前
Are ios and macos shortcuts crosscompatible? I didnt know there was shortcuts for the mac, seems pretty powerful to be able to run them from the terminal too. Thanks OP
评论 #38847263 未加载
predictsoft超过 1 年前
On Windows, A9T9 does a great job of OCR&#x27;ing scanned JPEG files (and any JPEG file). It&#x27;s also free.<p>I scanned about 100 A4 documents in just a couple of minutes.
minimaxir超过 1 年前
Surprisingly, the Extract Text from Image action is available on Intel Macs: normally, features like automatic-image-OCR is limited to Apple Silicon Macs.
评论 #38850576 未加载
systemtrigger超过 1 年前
This works great for local files. I can&#x27;t seem to modify the shortcut correctly for an image hosted at a public URL.
loevborg超过 1 年前
CleanShot X (which is great) also allows you to OCR from your screen (&quot;Capture Text&quot;)
gvkhna超过 1 年前
Is there any benchmarks on speed&#x2F;compute&#x2F;accuracy anywhere comparing to tesseract v5?
cyberax超过 1 年前
It doesn&#x27;t work for Chinese characters :(
CodeNest超过 1 年前
Python is quite basic and might not be very helpful for advanced users. It seems overly detailed for such a simple task.
geniium超过 1 年前
Have u guy tried ChatGpt or other alternative?
评论 #38848347 未加载