TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Page Dewarping

743 pointsby pietrofmaggialmost 9 years ago

19 comments

kazinatoralmost 9 years ago
This strikes close to home for me, because I&#x27;ve done this &quot;by hand&quot; quite a number of times, using some interactive Gimp filters, with very good results. I&#x27;ve been able to take a perspective photo of a curved page lying on a desk, get it almost perfectly straight, and threshold it to clean black and white or sharp gray scale.<p>Here is my very quick and dirty manual job of the same example page:<p><a href="http:&#x2F;&#x2F;www.kylheku.com&#x2F;~kaz&#x2F;dewarp.png" rel="nofollow">http:&#x2F;&#x2F;www.kylheku.com&#x2F;~kaz&#x2F;dewarp.png</a><p>Literally less than five minutes.<p>First I cropped the image. Then duplicated the layer. Blurred the top layer (Gaussian, 50 radius). Then flipped to Divide mode and merged the visible layers. This leveled the lightness quite well, almost completely eliminating the shadow over he right side of the page and all other lighting differences. There is a hint of the edge of the shadow still present because it is such a sharp contrast; but that can be eliminated in an adjustment of the intensity curves. In such cases it may be helpful to experiment with smaller blur radii, too.<p>I then did a perspective transform in the lateral direction, squeezing the left side top-bottom and expanding the right, resulting in the warp now being approximately horizontal. (The perspective transform is not just for adding a perspective effect; it is also useful <i>reversing</i> perspective!)<p>Finally, I used the Curve Bend (with its horrible interactive interface and awful preview) to warp in a compensating way. Basically, the idea is to draw an upper and lower curve which is the opposite of the curve on the page. I made two attempts, keeping the results of the second.<p>If the preview of this tool wasn&#x27;t a ridiculous, inscrutable thumbnail, it would be possible to do an excellent job in one attempt, probably close to perfect.<p>Because the page is evenly light thanks to the divide-by-blurred layer trick, it will nicely threshold to black and white, or a narrow grayscale range.
评论 #12317500 未加载
评论 #12316840 未加载
评论 #12317249 未加载
cooper12almost 9 years ago
I love well-illustrated writeups. Even a reader without mathematics or programming knowledge can understand what steps the author took. His model actually seems to better represent the warped paper than a cylinder would. (though I don&#x27;t know the actual specifics of the CTM model)<p>I wish he went into more details on the steps taken after dewarping. You can tweak the image levels to get good contrast, but surprisingly there aren&#x27;t any shadows from underleveling or loss of detail from overleveling. I wonder if the author ran OCR on the scans after, and speaking of OCR, IIRC Leptonica is one of the dependencies of Tesseract so it must do some similar pre-processing.<p>Edit: reading more carefully, he mentions that he used adaptive thresholding from OpenCV.
niftichalmost 9 years ago
Recently, Dropbox wrote about dewarping prior to OCR in their app: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=12297944" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=12297944</a><p>This code had the same idea, and is open-source!
评论 #12317364 未加载
评论 #12318826 未加载
troymcalmost 9 years ago
I use Microsoft&#x27;s &quot;Office Lens&quot; app on my Android phone all the time, like a &quot;smart camera&quot; which automatically squares off and white-balances each photo of a page (usually mail or forms filled in by hand). It can&#x27;t handle warped pages though, so I hope they add something like this!
评论 #12316059 未加载
peterjmagalmost 9 years ago
<i>it came in handy whenever a student emailed me their homework as a pile of JPEGs.</i><p>Gotta admire this guy&#x27;s resourcefulness—and patience. If I were a professor, I&#x27;d probably just reject the assignment outright if a student sent me a bunch of photos from their smartphone in lieu of a PDF or a &quot;proper&quot; scan. :)
评论 #12318592 未加载
评论 #12318859 未加载
mzuckeralmost 9 years ago
Thanks for the feedback, everyone -- happy to answer questions here or in the Disqus comments on my blog.
评论 #12315910 未加载
评论 #12316872 未加载
anilgulechaalmost 9 years ago
Here&#x27;s something that I think has not been done, but could be quite lucrative, building a high resolution scanner using the phone camera, multiple pictures and interpolation&#x2F;noise removal.<p>Most phone cameras these days have good resolutions, and you could technically take a 6x4 photo, divvy it to 3x3 grid and take close up photos, and have smart algorithms interpolate the pixels to form a single image with high res. I&#x27;d even bet you&#x27;d results equal to or better than a flat bed scanner.<p>For better us, just open the camera preview and slowly pan over the image.<p>Has someone tried something like this? With FOSS apps like mosaic, hdr tools and imagemagick, it should be possible. I&#x27;m guessing opencv would be needed for interpolation and noise removal..
评论 #12319347 未加载
Syzygiesalmost 9 years ago
This problem is fundamentally different with stereo images; there is a hope of reconstructing the exact 3D geometry of the page before flattening, rather than inferring from content. An iPhone app that did this would do well.
评论 #12314385 未加载
评论 #12315685 未加载
Cianticalmost 9 years ago
I wonder if someone has tried to dewarp whole book from a video when flipping through it. I imagine that could be handy way to copy the whole book.
评论 #12315726 未加载
评论 #12316360 未加载
jgablealmost 9 years ago
This is great! My wife is a music teacher and often scans sheet music so that it&#x27;s more portable. She has been asking me for a while for something exactly like this. I&#x27;ll have to tweak it to work on sheet music, since I imagine his methods to identify lines of text won&#x27;t work for the music staff out of the box.
评论 #12316000 未加载
评论 #12315766 未加载
评论 #12316537 未加载
评论 #12315972 未加载
ameliusalmost 9 years ago
The next step would be to &quot;depixelate&quot; the resulting image. How could this be done? I guess OCR would not work because of the variation of the fonts (you don&#x27;t want the document to end up in a single font; you want to keep the fonts). Could a deep learning approach work here, even if it has not been trained on all the specific fonts?
评论 #12318845 未加载
renloalmost 9 years ago
How much of an effect does the camera lens make in page warping? Correct me if I&#x27;m wrong, but for shorter focal length lenses I would think it would warp the page more. If a person accounted for that, could they get a near perfect result? Or does his algorithm account for that? It seems that one would need to know where the center of the image would be.
评论 #12317491 未加载
pettersalmost 9 years ago
&quot;You can see these are not exactly small optimization problems. The smallest one has 89 parameters in the model, and the largest has 600.&quot;<p>Those <i>are</i> small optmization problems. These types of problems are solved in computer vision for hundreds of thousands of variables. His problem can be solved in real-time, not tens of seconds.
评论 #12318893 未加载
paul_milovanovalmost 9 years ago
Consider mentioning Dan Bloomberg as the author of the original work as well as Leptonica. :)
voltagex_almost 9 years ago
Thank you for the gif near the start - it really helped me to understand what was going on.
saynseditalmost 9 years ago
Is using a curve whose end points are fixed to zero to model the warping accurate? I can&#x27;t see a rationale for why the end points should both be 0.
评论 #12316097 未加载
artursapekalmost 9 years ago
I&#x27;ve always been interested in getting into graphics programming, and stuff like this only makes me more interested. Really well written post.
nullcipheralmost 9 years ago
Wrap this in a service and you have a startup!
anowellalmost 9 years ago
This is solid. I&#x27;m an engineer at Algorithmia, and this caught our attention as the sort of project we love to host as a service on our algorithm marketplace. We&#x27;ve already made note of it for our team to consider adding (thanks to the generous MIT license), but I wanted to reach out in case you&#x27;d rather add, own, and optionally monetize it on our platform yourself. Either way, this was a great read with impressive results.
评论 #12316895 未加载