TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Writing a fuzzy receipt parser in Python

133 pointsby andygrunwaldover 9 years ago

8 comments

bariumbitmapover 9 years ago
It&#x27;s a shame that receipts don&#x27;t have machine readable output.<p>QR codes can hold a little over 1,200 characters, which should be more than enough for most receipts.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;QR_code#Storage" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;QR_code#Storage</a><p>Edit: related link: <a href="https:&#x2F;&#x2F;www.quora.com&#x2F;Can-and-how-cash-register-receipts-be-transformed-to-a-QR-code-and-scanned-with-a-smartphone-app-so-it-can-become-digital?share=1" rel="nofollow">https:&#x2F;&#x2F;www.quora.com&#x2F;Can-and-how-cash-register-receipts-be-...</a>
评论 #10339716 未加载
评论 #10341010 未加载
评论 #10339588 未加载
评论 #10341101 未加载
评论 #10339511 未加载
laitoover 9 years ago
Hey, this is pretty cool. I actually tried something similar. (Keeping a list of shop names and matching it with tesseract&#x27;s results) I was trying hough transform for slight image rotations. I wasn&#x27;t aware of imagemagick&#x27;s textcleaner script. That could have save me a lot of trouble :) I got roadblocked by the problem of having various kinds of receipts with absolutely no layout in common. I figured it would need a lot of training for the system to have a decent accuracy and left it for another day.
评论 #10339536 未加载
omn1over 9 years ago
Hey, author here. I am happy for all questions or every kind of feedback.
评论 #10341593 未加载
评论 #10338630 未加载
评论 #10338585 未加载
评论 #10340385 未加载
评论 #10338586 未加载
评论 #10338588 未加载
pbnjayover 9 years ago
For the next step, and easier name matching... why not export a CSV of your online banking and use names and totals to match? Or are these cash receipts?
评论 #10342509 未加载
评论 #10339498 未加载
评论 #10342513 未加载
joshribakoffover 9 years ago
I&#x27;ve considered an app that would do this in the past. It would be like mint.com which automatically tracks your finances via online banking, but instead of showing you spent $100 at the supermarket, it would show that you spent $20 on beer, $50 on cash back, and $30 on food... allowing better insights into your finances &amp; where to cut back to save money.
misnomeover 9 years ago
I&#x27;ve been thinking about something vaguely similar for paperwork processing. It&#x27;d be nice to pull company name from recognising the layout&#x2F;logo, and an attempt at reading the date out of the page.<p>Anyone know any resources or an idea for direction to get started on this?
评论 #10341462 未加载
t_gover 9 years ago
If you are genuinely interested in this sort of thing, I&#x27;d like to think we do a pretty good job at receipt parsing.<p><a href="http:&#x2F;&#x2F;www.neat.com&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.neat.com&#x2F;</a><p>Disclaimer: I work for the company.
评论 #10342944 未加载
comrhover 9 years ago
I think I would have more problems saving all the receipts using this workflow. Just logging them into YNAB&#x27;s mobile app is great for me.
评论 #10339375 未加载