TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Project Naptha: a browser extension that enables text selection on any image

1055 点作者 antimatter15大约 11 年前

67 条评论

atourgates大约 11 年前
Wow - this is amazing.<p>Right this very moment (well, a few moments ago when I wasn&#x27;t procrastinating on HN) I was in the midst of extracting data from a client&#x27;s old website in preparation of creating a new website.<p>A lot of that data is contained within images.<p>From a few preliminary tests, I&#x27;m hugely impressed. This seems on-par with any other OCR software I&#x27;ve used, and the fact that it happens in realtime in the browser is amazing.<p>I tried it on a piece of content I&#x27;d just had to type out, that was originally in an image. Typing out the content took about 10 minutes. Copying and pasting with Naptha, and then making some minor edits&#x2F;corrections, did the same thing in about 2 minutes.
评论 #7631211 未加载
评论 #7630840 未加载
trishume大约 11 年前
Holy crap, antimatter15 does so many cool things. I keep finding things that are really cool and then scroll down to find they are all written by him. First Shinytouch, then Protobowl years later and now this. And he&#x27;s only a year older than me (19) so it isn&#x27;t that he&#x27;s had more time. Check out his Github profile for more of his projects: <a href="http://github.com/antimatter15" rel="nofollow">http:&#x2F;&#x2F;github.com&#x2F;antimatter15</a>
评论 #7639652 未加载
jgj大约 11 年前
&gt; Unfortunately, your browser is not yet supported, currently only Google Chrome is supported.<p>FF 28 seems to be working fine with the &quot;Weenie Hut Jr.&quot; version...is it just the add-on that isn&#x27;t supported?<p>awesome tech, btw
评论 #7629652 未加载
评论 #7630276 未加载
vidarh大约 11 年前
Reminds me of Powersnap on the Amiga. Many applications did their own text rendering without supporting cut and paste, and so this guy called Nico Francois had the bright idea of letting you select a region of a window, and matching the standard fonts against the windows bitmap.<p>Of course then it was &quot;easy&quot;: almost all the text would have been rendered with one of a tiny number of fonts available on the system, with little to no distortion.
评论 #7632004 未加载
pestaa大约 11 年前
This is great news for those who have to live with disabilities.<p>Maybe soon I won&#x27;t feel guilty for leaving my alt attributes empty.
leeoniya大约 11 年前
@antimatter15, i have a project that does client-side image analysis and decompses document structures. it looks like your OCR code would be a great replacement for the server-side Tesseract ocr i currently use :)<p>here&#x27;s what the project does now with js + web workers:<p><a href="http://i.imgur.com/QvXSkY2.png" rel="nofollow">http:&#x2F;&#x2F;i.imgur.com&#x2F;QvXSkY2.png</a><p>processing time is &lt; 1500ms in Chrome and &lt; 2000ms in FF<p>the code is open source, though using it isnt yet polished. i&#x27;m working slowly on a blog post series to detail how to use the lib(s). <a href="https://github.com/leeoniya/pXY.js" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;leeoniya&#x2F;pXY.js</a><p>a walkthrough of the base lib is here: <a href="http://o-0.me/pXY/" rel="nofollow">http:&#x2F;&#x2F;o-0.me&#x2F;pXY&#x2F;</a>
评论 #7630619 未加载
skizm大约 11 年前
Doesn&#x27;t work great. Went to reddit&#x27;s advice animal page to try it out and it doesn&#x27;t seem to work with livememe (I think they have an invisible layer over their images to try and block hot linking).<p>Here is a copy&#x2F;paste example from imgur:<p><a href="http://i.imgur.com/sKQXx8v.jpg" rel="nofollow">http:&#x2F;&#x2F;i.imgur.com&#x2F;sKQXx8v.jpg</a><p>Top: vou SAID w[ W[R[ |[AVINĞ`ON TIM[TOAV<p>Bottom: TN[ FACTTNATl&#x27;M MAWING TNISM[M[ g INST[AD of DRIVING D[TERMIN[D TN#rWASA ll[<p>Maybe it needs to be a certain font for better results. Still pretty cool. Hopefully all the kinks get worked out. I would definitely find this useful.<p>EDIT: need to make sure the language is set to &quot;internet meme&quot; and it works much better.
评论 #7629684 未加载
评论 #7629693 未加载
elwell大约 11 年前
Every time I click &quot;Allow&quot; on &quot;Access data on all sites&quot; for an extension I creep closer to my security hole paranoia threshold. If it was all in JS, who cares? But this sends ajax to remote servers of course.<p>Am I alone?
评论 #7633277 未加载
评论 #7633087 未加载
评论 #7632061 未加载
yaddayadda大约 11 年前
1) Very, very flippin&#x27; cool!<p>2) Erase Text option menu location Using version 0.7.2, the &quot;Erase Text&quot; option is displayed under the &quot;Translate&quot; section (certainly not where I would ever intentionally look for it).<p>3) Select Text -&gt; Right-click changes selection After selecting my text, when I right-click the selected text often (almost always) changes. For example, with the kitten text, I selected both paragraphs, but when I right-clicked to go to Translate-&gt;Erase the first paragraph ceased to be highlighted. After erasing the second paragraph I tried in vain to select <i>and</i> erase the first paragraph, but everytime I&#x27;d right-click the selected paragraph only a single word would still be highlighted. I eventually tried erasing text while only one word was highlighted and the entire first paragraph was erased.<p>4) I really appreciate the Security &amp; Privacy section of the project page.<p>5) I would love to see a Firefox version of Project Naptha!
bigbugbag大约 11 年前
I wonder how deep this project is in violation of the GPLv3.<p>For starter it&#x27;s based on gnu ocrad [1] but fails to state a license and to publish any source code.<p>[1]: <a href="https://www.gnu.org/software/ocrad/" rel="nofollow">https:&#x2F;&#x2F;www.gnu.org&#x2F;software&#x2F;ocrad&#x2F;</a>
评论 #7633284 未加载
评论 #7635148 未加载
JoelHobson大约 11 年前
This is simply incredible. I&#x27;m just blown away by it.<p>I wonder if you could get better performance when running locally by sending the result through a spellchecker and doing some Bayesian magic on the word choice...
iooi大约 11 年前
Couldn&#x27;t get it to work on: <a href="http://graphics8.nytimes.com/adx/images/ADS/37/09/ad.370964/184x90_LEFT_cm_final.jpg" rel="nofollow">http:&#x2F;&#x2F;graphics8.nytimes.com&#x2F;adx&#x2F;images&#x2F;ADS&#x2F;37&#x2F;09&#x2F;ad.370964&#x2F;...</a><p>Also for: <a href="http://www.wsoddata.com/clients/8bec9b10/ads/300x250_static/images/300x250_v1_bkgd.png" rel="nofollow">http:&#x2F;&#x2F;www.wsoddata.com&#x2F;clients&#x2F;8bec9b10&#x2F;ads&#x2F;300x250_static&#x2F;...</a> It can&#x27;t get the top-right text correctly<p>Awesome tech though
评论 #7630022 未加载
rooted大约 11 年前
Very slick! Does it automatically start OCRing every image, or does it wait for a user to try to select the image text? Asking because I&#x27;m concerned about this decreasing performance.
评论 #7630734 未加载
tiles大约 11 年前
This is amazing! Is there a planned open-source license or commercialization of this?
SchizoDuckie大约 11 年前
Wow. Just wow. How did I live my life before this?<p>Once again, such a simple implementation by somebody that grabs some components that have been around for ages and mashes them up in a way that makes people question why it wasn&#x27;t invented before<p>I&#x27;ve got this installed and it&#x27;ll probalby never leave my chrome profiles. Keep up the awesome work!
userbinator大约 11 年前
I have a feeling that if you just make the OCR better, a lot of users are going to use this for entering CAPTCHAs...
评论 #7632633 未加载
michaelchum大约 11 年前
I remember your 2nd place win at HackMIT, congratz again. It was THE most useful hack by far and I&#x27;m glad you&#x27;ve made it a public product now, and free. Wow, it seems like you beat all those years industrial OCR products... and by far. This is simply amazing, keep on the great work!!!
aalpbalkan大约 11 年前
Certainly a cool idea but it didn&#x27;t work fine on an XKCD comic:<p><a href="http://www.xkcd.com/" rel="nofollow">http:&#x2F;&#x2F;www.xkcd.com&#x2F;</a> bottom line here is recognized as: &quot;T1EN°5&#x27;lI&#x27;ONAl.1?E£ONNH\56PNCE(YHCEPlP6fiN(N)SURLH’PR3AO-i‘lDlsIr&#x27;£7E‘5IJ%z&quot;
评论 #7629858 未加载
jpasden大约 11 年前
This is amazing, and it has truly revolutionary implications for learners of scripts like Chinese, which are still truly indecipherable to learners when embedded in images. I was really happy to see that this extension supports both simplified and traditional Chinese. I tried it out, and while it shows promise there, it definitely still needs a lot of work.<p>I posted a review on my blog here: <a href="http://www.sinosplice.com/life/archives/2014/04/24/can-project-naptha-read-chinese-text-in-images" rel="nofollow">http:&#x2F;&#x2F;www.sinosplice.com&#x2F;life&#x2F;archives&#x2F;2014&#x2F;04&#x2F;24&#x2F;can-proje...</a><p>OP, I&#x27;d be happy to work with you on improving the recognition of Chinese text. Just get in touch with me through my blog (linked to above).
m_ke大约 11 年前
Cool, I implemented the stroke width transform for text detection about a year ago. Nice to see someone else using implementing it, but I&#x27;m pretty sure convolutional neural nets do a better job at text localization.
plicense大约 11 年前
This isn&#x27;t particularly awesome, because<p>1. The implementation of Stroke Width Transform is not super good. So far, <a href="http://libccv.org/" rel="nofollow">http:&#x2F;&#x2F;libccv.org&#x2F;</a> has the best implementation of SWT. But again, you can neither make the head nor the tail of that implementation.<p>2. There are just too many false text regions and the text detection accuracy is no where near what you can call good. A mixed use of multiple OCR engines might give better results.<p>All that said, you can&#x27;t take away the cleverness of the application of detecting text. Mind == Blown, on that area.
评论 #7630313 未加载
sailfast大约 11 年前
This looks very cool and could come in quite handy.<p>In case anyone from the project is monitoring - text selection did seem to work fine for me in FireFox (ESR 24.3) despite the &quot;Not Supported&quot; text being displayed.
评论 #7631887 未加载
x0ner大约 11 年前
Extension is awesome and while the code is messy, it has enough little jokes to keep you amused. For those looking to access the backend OCR service, it seems to be down right now, but will hopefully come back up soon.<p>Here were the API references I could find for the remote OCR:<p>- GET <a href="https://sky-lighter.appspot.com/api/read/&lt;chunk.key&gt;" rel="nofollow">https:&#x2F;&#x2F;sky-lighter.appspot.com&#x2F;api&#x2F;read&#x2F;&lt;chunk.key&gt;</a><p>- GET <a href="https://sky-lighter.appspot.com/api/lookup?url=&lt;image.src&gt;" rel="nofollow">https:&#x2F;&#x2F;sky-lighter.appspot.com&#x2F;api&#x2F;lookup?url=&lt;image.src&gt;</a><p>- POST <a href="https://sky-lighter.appspot.com/api/translate" rel="nofollow">https:&#x2F;&#x2F;sky-lighter.appspot.com&#x2F;api&#x2F;translate</a><p>Apparently the author was one of the winners of HackMIT 2013 according to some of the comments. Couple of fun things in there if you decide to poke around in the code. Jump into naptha-wick.js for the remote logic.<p>Note from the Dev (<a href="http://challengepost.com/users/antimatter15" rel="nofollow">http:&#x2F;&#x2F;challengepost.com&#x2F;users&#x2F;antimatter15</a>, <a href="http://antimatter15.com/wp/" rel="nofollow">http:&#x2F;&#x2F;antimatter15.com&#x2F;wp&#x2F;</a>, <a href="https://twitter.com/antimatter15" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;antimatter15</a>):<p>&#x2F;* It&#x27;s April 16, 2014.<p>It&#x27;s been six months since I started this project.<p>Just under two years after I first came up with the idea.<p>It&#x27;s weird to think of time as something that happens, to think of code as something that evolves. And it may be obvious to recognize that code is not organic, that it changes only in discrete steps as dictated by some intelligence&#x27;s urging, but coupled with a faulty and mortal memory, its gradual slopes are indistinguishable from autonomy.<p>Hopefully, this project is going to launch soon. It looks like there&#x27;s actually a chance that this will be able to happen.<p>The proximity of its launch has kind of been my own little perpetual delusion. During the hackathon, I announced that it would be released in two weeks time.<p>When winter break rolled by, I had determined to finish and release before the end of the year 2013.<p>This deadline rolled further way, to the end of January term, IAP as it is known. But like all the artificial dates set earlier, it too folded against the tides of procrastination.<p>I&#x27;ll spare you February and March, but they too simply happened with a modicum of dread. This brings us to the present day, which hopefully will have the good luck to be spared from the fate of its predecessors.<p>After all, it is the gaseous vaporware that burns.<p>*&#x2F;
评论 #7631220 未加载
steren大约 11 年前
Very impressive work. I&#x27;m not surprised to find antimatter15 behind it.<p>The website was not very clear if work was done client-side or not (mentioning server calls). It turns out that server calls can be disabled and the extension is working quite fine without. By default, I would disable this option and offer opt-in, it is better for privacy I think.
StringyBob大约 11 年前
This is great. I&#x27;d love to see this extended for natural images with whatever algorithm Google uses for OCR in streetview - <a href="http://googleonlinesecurity.blogspot.co.uk/2014/04/street-view-and-recaptcha-technology.html" rel="nofollow">http:&#x2F;&#x2F;googleonlinesecurity.blogspot.co.uk&#x2F;2014&#x2F;04&#x2F;street-vi...</a>
bananas大约 11 年前
This is EXACTLY what I need at the moment.<p>I get a big problem with various people sending me screenshots with stackdumps in. This is perfect for extracting them into the ticket bodies and it does it perfectly (I&#x27;ve just done 20 with it and manually checked them!)<p>This is the sort of stuff that really improves people&#x27;s lives by making all data equal.
Craque大约 11 年前
Please help, it looks brilliant, however, only the test page works for me. Can&#x27;t get any other pages to work. Text simply isn&#x27;t selectable - cursor remains as a pointer, not an&#x27;I&#x27; :(<p>I&#x27;m using the latest version of Chrome on a modern Mac and have Naptha properly installed and Chrome has been relaunched.<p>Any hints would be appreciated.
yeukhon大约 11 年前
Awesome. I was actually at HackMIT. It is great to see you actually continue working on this. As a matter of fact, I told my friends who were working on similar idea for their senior project your project name last Fall. I emailed you for the Microsoft reference papers :) Not sure if I should copy and paste that.<p>Anyway, good luck!
tehaaron大约 11 年前
This is really neat. I was playing with it on pictures of street signs and buildings and realized that if I select some text and then do ctrl+a it tried to select everything it thought was text...Then I used right click &gt; translate &gt; reprint to see what it thought each thing was.<p>Here is the picture: <a href="http://thesuperslice.com/wp-content/uploads/2012/04/downtownla_timelaps5.png" rel="nofollow">http:&#x2F;&#x2F;thesuperslice.com&#x2F;wp-content&#x2F;uploads&#x2F;2012&#x2F;04&#x2F;downtown...</a><p>And the text outcome - found it most interesting what symbols it thought it recognized:<p>lam<p>on-0&#x27;0<p>s.<p>Ic 0on<p>§-i-<p>I-*-<p>-unm<p>-$3.»;<p>o<p>G %T1<p>00-O<p>. o C-‘7&#x27; H &#x27; .-.”-.&quot; «&#x27;~3;<p>.35<p>$16 O-O<p>‘D Q-=¢1<p>‘-M<p>km“<p>‘MIMI<p>DOW:<p>TLDR<p>D001”<p>&#x27;.&quot;&#x27;IIu<p>ff&quot;<p>)0‘<p>\\<p>,¢-.5 ,:~L.<p>r&#x2F;J
RyanMcGreal大约 11 年前
I tried it on the handwritten all-caps text on this page: <a href="http://xkcd.com/1271/" rel="nofollow">http:&#x2F;&#x2F;xkcd.com&#x2F;1271&#x2F;</a><p>It (sort of) worked:<p>&quot;I AB5ENTH|NDEDLY5ELECT RANDU1 Bl.OO&lt;5 OFTEXTHSI READ, PND FEEL SLRONSCDUSLY SATISFIED LHEN THE HIGHUGHTED AREA |&quot;PKE5 H 5Yl’R1ETRICHL 5|-PPE&quot;
frankosaurus大约 11 年前
I had high hopes for this, as I sometimes need to manually transcribe serial numbers from customers&#x27; screenshots.<p>However, it seems to confuse letter O and number 0. Since serial numbers are not English words, I&#x27;m not sure how you would solve this unless you had a lookup for commonly used web fonts.
bigbugbag大约 11 年前
Seemed like an interesting project, clicked on the linked scanned the page an it seems to be an empty pointless web page trying to explain over pages worth of scrolling that it allows to deal with text trapped inside images which I already knew when I clicked the link.<p>Going back to the page after closing it once, I noticed written in smaller characters that this somewhat pointless page is for a useless extension as it is exclusively limited to the worst offender privacy wise of a web browser that I would not touch with a stick. google chrome is the new internet explorer to me as its main use is to download firefox.<p>In conclusion this looked promising but a confusing web page and browser lock-in renders it useless and shows that it is far from doing what it claims. &quot;... on every image you see while browsing the web&quot; should be &quot;...on every image you see while browsing the web in google chrome&quot;.<p>No github and no open license tells me that as a linux user of opera I&#x27;m pretty much assured I will never see a version of this extension.
评论 #7632798 未加载
Omnipresent大约 11 年前
This is extremely powerful for the end user. I&#x27;ve been doing a bit of OCR work using some pre-processing methods combined with Tesseract and OpenCV. I am curious to know how you are doing this on the fly and also as a chrome extension. Is the processing done in JS?
3JPLW大约 11 年前
The biggest thing I&#x27;d like to see is enabling in-page (control&#x2F;command-f) search. In my quick scan through the page it looks like it doesn&#x27;t do that… is that right? Are there plans to add invisible text to the DOM that control-f can find?
评论 #7629824 未加载
cornholio大约 11 年前
I like the way this extension removes text in the image, but I would much rather have a video delogo filter for that does not suck. It would be very useful for removing hard subtitles, station logos, screener warnings etc.
eddyb大约 11 年前
The Mentalist reference, anyone?<p>In any case, pretty cool project, I&#x27;m a bit amazed how far we&#x27;ve come since I&#x27;ve last played with OCRs (and defeated one bad CAPTCHA implementation, still in use at pastebin.com it seems).
adem大约 11 年前
Cool idea, definitely worth exploring the possibilities. A quick run showed me that it often interprets the &quot;i&quot; as &quot;l&quot; whenever the the gap between the line and dot is not apparant
deviltreh大约 11 年前
Now that is pretty damn cool. Will help at work when marketing people do not copy paste email&#x2F;article and just put screenshot of it and if you want to quite something from that picture...
RaphiePS大约 11 年前
Saw you demo this at the hackathon session at CPW! Really, really cool.
jpdlla大约 11 年前
@antimatter15 any recommendations for optimizing Tesseract?
评论 #7634245 未加载
评论 #7633451 未加载
评论 #7633443 未加载
Tsagadai大约 11 年前
I have wanted an extension to do this for so long. I even started coding my own at one stage but hit various issues. Thank you so much for creating this.
jawerty大约 11 年前
You guys should consider making an API for this. It would be awesome to have an API that inputs images via url and outputs the text of said image.
darkhorn大约 11 年前
What project won the first place in HackMIT 2013?
评论 #7631013 未加载
评论 #7630907 未加载
vanderZwan大约 11 年前
&gt; <i>In a sense that’s kind of like what a human can do: we can recognize that a sign, </i><p>Oh god... how does it finish! I need closure!<p>(PS: this is awesome)
评论 #7631409 未加载
jonnynezbo大约 11 年前
This is pretty cool, but not perfect. Upon copy &amp; pasting the captured text, several of the words and letters are wrong.
swavaldez大约 11 年前
Everyone deserves to have this extension. It&#x27;s even better if it could be a browser&#x27;s default feature. ;P
Aardwolf大约 11 年前
I would like it if this would work on ANY text in webpages.<p>Too many webpages make it too hard to select even actual plain text.
krsunny大约 11 年前
Completely agree with the &quot;where has this been all my life&quot; sentiments. This is awesome, thank you.
username42大约 11 年前
Just tested with a random scanned page (<a href="http://www.hpl.hp.com/research/info_theory/ShannonWeb/fullsize/A)%20Clean%20Original.gif" rel="nofollow">http:&#x2F;&#x2F;www.hpl.hp.com&#x2F;research&#x2F;info_theory&#x2F;ShannonWeb&#x2F;fullsi...</a>) the result is almost garbage. It seems as bad as most OCR software I have encountered. This was to be expected as it is based on ocrad.
评论 #7632955 未加载
评论 #7633066 未加载
nileshtrivedi大约 11 年前
Very nifty. Although, it would have been even more awesome if it worked with Google Books.
bz123大约 11 年前
cool idea, a bit buggy yet and when i am trying to actually save images i do get the custom extension right click bar instead of the normal chrome bar to save the image, but i guess its still under development.
swah大约 11 年前
Great idea, this should make a couple million for the creators.
amazd大约 11 年前
This seems like a great addition to my side project (amazd.com)
darkhorn大约 11 年前
In Firefox 31.0a1 I can copy the text only with Ctrl+C.
sourcex大约 11 年前
This is Awesome and very useful! BTW did he do it ?
ernestipark大约 11 年前
Awesome. How does this affect page performance?
valbaca大约 11 年前
Worth it for the dozen-click easter egg.
seshakiran大约 11 年前
Very nice. Will give it a try.
nemrow大约 11 年前
Way cool! I am impressed.
Thiz大约 11 年前
Magic.<p>Indistinguishable from magic.
atixid91大约 11 年前
wow a step ahead! amazing extension....
est大约 11 年前
bonus points to scan QR-codes
pinaceae大约 11 年前
Very cool stuff, but need to satisfy my OCD:<p>It&#x27;s spelled Naphtha (<a href="http://en.wikipedia.org/wiki/Naphtha" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Naphtha</a>). And for the HN hordes - read the bottom of the linked project page, it is supposed to be a reference to Naphtha.<p>:)
评论 #7631473 未加载
sscalia大约 11 年前
Badass. Now support Good Browsers™ like Safari and Firefox.
评论 #7630463 未加载
jbeja大约 11 年前
This my friends is called &quot;Innovation&quot;.
batmansbelt大约 11 年前
Now the NSA will be reading the contents of your animated GIFs.
评论 #7630178 未加载
评论 #7630701 未加载
bondolo大约 11 年前
I can imagine quite a few blind people are creaming their jeans about now.