TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Open Paperless – Scan, index, and archive paper documents

285 pointsby zhoubearover 7 years ago

19 comments

Spearchuckerover 7 years ago
This is arguably a lot more than I need. I&#x27;m a hoarder in that I have every email I&#x27;ve ever sent or received (bar junkmail), and every piece of paper I&#x27;ve ever received.<p>Most of my paper is now scanned - I think I have two boxes left in my garden shed. I don&#x27;t bother with OCR because search doesn&#x27;t help me when I don&#x27;t know what to search for (e.g. invoice for a jumper I bought in 2010 - fashion labels rarely call their jumpers jumper).<p>And so I rely on meta data. There&#x27;s not much out there in terms of open-source tagging software, and even less in terms of an open tagging approach. I ended up with tagspaces, which is a web app packaged up as a native app. The approach to tagging is good (tags appended to file name), but the app is abysmally poor. Slow - waiting up to 30 seconds for a pop-up menu to appear. It assumes tag-based searches work in only one way.<p>The intent is to write some native apps to solve my biggest problems. For now I&#x27;m still trying to clear the backlog of un-scanned paper docs (not going to get this done for me, because privacy). I tag important stuff, like employment contracts, mortgage agreements, passports and birth certificates...<p>Hope to have everything done by the time I cash in my chips. Might make for a useful dataset for someone somewhere some day.
评论 #15987810 未加载
theomegaover 7 years ago
I want to show an alternative approach to managing your documents:<p>Store them in your IMAP&#x2F;Mails. Either on an own account or in a dedicated sub-folder.<p>I wrote some small python scripts [1] which allow you to: - Add an email with the PDF attached to your document collection. The script supports adding a subject and adding tags to it - Go over all the emails and run an OCR (tesseract) on them: Attach the OCR result together with the pdf to the email.<p>Big advantage: - Search on IMAP is a solved problem - Clients for every operating system in the world, including web, mobile - Super simple backup and restore<p>Over course, very geeky, nothing for your parents, but maybe something for you?<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;theomega&#x2F;IMAP_DMS" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;theomega&#x2F;IMAP_DMS</a>
评论 #15991830 未加载
评论 #15986990 未加载
jopsenover 7 years ago
Question: why bother organizing papers?<p>I just throw everything in a box, if I ever need it again later it&#x27;ll take a long time to find.. but I rarely need to find a document again.<p>Complexity of archiving a document is O(1) with a very small constant. Complexity of retrieval is O(N) for a large N.<p>But I have few retrievals in my system, so why pay a higher per document cost?
评论 #15986675 未加载
评论 #15986422 未加载
评论 #15986419 未加载
评论 #15986771 未加载
评论 #15986416 未加载
评论 #15986635 未加载
评论 #15994090 未加载
评论 #15993068 未加载
pingecover 7 years ago
Does anyone know any similar free&#x2F;open products for archiving documents, tracking etc.?<p>What I am after is a system like expensive solutions have in some companies where the mailbox department prints (or has preprinted) labels with unique bar codes, for any incoming mail, they open it, stick a label on it, scan it with the label on it and then physically deliver it. Some departments also input recipient and sender details, add tags etc. So in the end they have a searchable database by persons involved, content type, tags and also all documents (physical and digital) have a referenceable id that can be used for various purposes.
prashntsover 7 years ago
I&#x27;ve been using iOS and Mac&#x27;s native notes app to do that. In my opinion what these solutions lack is an integration between both note-taking (I sometimes like to write a few sentences relevant to a document, and I&#x27;d like to have it shown right next to it) while also letting you have the individual documents available in PDF or whatever if you need. Notes app does it perfectly now after iOS 11.1 and High Sierra.<p>An example is this screenshot from my notes <a href="https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;xuZqW" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;a&#x2F;xuZqW</a>
评论 #15986399 未加载
评论 #15987041 未加载
评论 #15984588 未加载
pw0nkaover 7 years ago
Looks great. Love the idea behind it, but...<p>There is at least one country (mine - Switzerland) which is not able to use software like yours. The problems are the current laws that force people and organizations to store physical copies of the documents (for several years). Electronic documents have no value in front of the law, which is why we have no choice but to do all of that offline, manually.<p>I&#x27;ve tried many archiving solutions, but non of them saved any bit of time. The one single, missing feature was an automatism to print a serial code (the electronic document ID) back on the original document. This way you could just scan it, print it, put it in a large box where you sort it by its ID - that simple. And this would even work if you would use spacers to split the documents on the scanning process.
评论 #15985756 未加载
评论 #15986197 未加载
评论 #15985757 未加载
评论 #15989784 未加载
y4miover 7 years ago
a nontrivial name conflict with Paperless (<a href="https:&#x2F;&#x2F;github.com&#x2F;danielquinn&#x2F;paperless" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;danielquinn&#x2F;paperless</a>) ...
评论 #15987397 未加载
评论 #15986206 未加载
tjoffover 7 years ago
I don&#x27;t know what Mayan EDMS is and all this readme does is saying what it is in relation to Mayan EDMS. Extremely frustrating.
评论 #15986610 未加载
carwynover 7 years ago
There&#x27;s also this paperless <a href="https:&#x2F;&#x2F;github.com&#x2F;danielquinn&#x2F;paperless" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;danielquinn&#x2F;paperless</a>
curioussavageover 7 years ago
Any good open source desktop software with linux support to do this? I don&#x27;t see why I would personally want a web app for this.
评论 #15985898 未加载
评论 #15985487 未加载
评论 #15985981 未加载
评论 #15985467 未加载
评论 #15987431 未加载
karinatoover 7 years ago
For those wondering about the relationship between Mayan EDMS, Paperless and Open Paperless here is a story line summary of the saga.<p>Roberto Rosario (the creator of Mayan) is a very well known name in the Django, Python, document management, maker, hacking, open health and open source in the goverment circles.<p>- <a href="https:&#x2F;&#x2F;speakerdeck.com&#x2F;siloraptor" rel="nofollow">https:&#x2F;&#x2F;speakerdeck.com&#x2F;siloraptor</a> - <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Roberto_Rosario" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Roberto_Rosario</a> - <a href="https:&#x2F;&#x2F;www.pycon.it&#x2F;conference&#x2F;p&#x2F;roberto-rosario" rel="nofollow">https:&#x2F;&#x2F;www.pycon.it&#x2F;conference&#x2F;p&#x2F;roberto-rosario</a> - <a href="http:&#x2F;&#x2F;pyvideo.org&#x2F;djangocon-us-2014&#x2F;liberation-and-modernization-of-government-legacy.html" rel="nofollow">http:&#x2F;&#x2F;pyvideo.org&#x2F;djangocon-us-2014&#x2F;liberation-and-moderniz...</a> - <a href="https:&#x2F;&#x2F;cpucadviceletters.org&#x2F;login&#x2F;?next=&#x2F;" rel="nofollow">https:&#x2F;&#x2F;cpucadviceletters.org&#x2F;login&#x2F;?next=&#x2F;</a> - <a href="https:&#x2F;&#x2F;twit.tv&#x2F;shows&#x2F;floss-weekly&#x2F;episodes&#x2F;253" rel="nofollow">https:&#x2F;&#x2F;twit.tv&#x2F;shows&#x2F;floss-weekly&#x2F;episodes&#x2F;253</a> - <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Mayan_(software)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Mayan_(software)</a> - <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=rubzEAojf-k" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=rubzEAojf-k</a><p>Mayan EDMS was initially released in February 3, 2011 (Wikipedia and git log). In June 2015, Roberto gave a workshop in DjangoCon named From zero to paperless with Mayan EDMS (<a href="https:&#x2F;&#x2F;archive.is&#x2F;FDpYS" rel="nofollow">https:&#x2F;&#x2F;archive.is&#x2F;FDpYS</a>). Daniel Quinn (the creator of Paperless) also attended and presented at the same DjangoCon event (<a href="https:&#x2F;&#x2F;vimeo.com&#x2F;135907408" rel="nofollow">https:&#x2F;&#x2F;vimeo.com&#x2F;135907408</a>) and 6 months later after working on it for several months (Daniel&#x27;s own words), he released Paperless on December 20, 2015 (<a href="https:&#x2F;&#x2F;github.com&#x2F;danielquinn&#x2F;paperless&#x2F;commits&#x2F;master?after=af4623e60563f5e4328e87ec8027f79804f8d08a+559" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;danielquinn&#x2F;paperless&#x2F;commits&#x2F;master?afte...</a>). By January 24, 2016, Paperless had &quot;exploded in popularity&quot; (<a href="https:&#x2F;&#x2F;twitter.com&#x2F;danielagquinn&#x2F;status&#x2F;691242822431830016" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;danielagquinn&#x2F;status&#x2F;691242822431830016</a>).<p>Both projects used Python, Django, same Django 3rd party apps like DjangoSuit, same document consumer model, same OCR engine, REST API, among other things. On the surface it appeared that Paperless was a copy of Mayan EDMS concepts and implementations without giving credit or mention. Many additions were planned for Paperless that were features and implementations already in Mayan (<a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;selfhosted&#x2F;comments&#x2F;44mh88&#x2F;scan_index_and_archive_all_of_your_paper_documents&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;selfhosted&#x2F;comments&#x2F;44mh88&#x2F;scan_ind...</a>).<p>A separate point of contention was that the name &quot;Paperless&quot; had been in use by other projects much earlier that Daniel&#x27;s Paperless (<a href="https:&#x2F;&#x2F;github.com&#x2F;search?utf8=%E2%9C%93&amp;q=paperless&amp;type=" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;search?utf8=%E2%9C%93&amp;q=paperless&amp;type=</a>). Since there is no trademark on the name or description, other projects appeared with the same name and description (<a href="https:&#x2F;&#x2F;github.com&#x2F;lrnt&#x2F;paperless" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;lrnt&#x2F;paperless</a>).<p>On March 15, 2016, Daniel presented Paperless at CodeNode (<a href="https:&#x2F;&#x2F;skillsmatter.com&#x2F;skillscasts&#x2F;7843-intro-to-paperless" rel="nofollow">https:&#x2F;&#x2F;skillsmatter.com&#x2F;skillscasts&#x2F;7843-intro-to-paperless</a>).<p>It was Daniel&#x27;s February 27, 2016 tweet suggesting to be paid to work on Paperless that sparked the animosity between the users of the two projects (<a href="https:&#x2F;&#x2F;twitter.com&#x2F;danielagquinn&#x2F;status&#x2F;703629488932970500" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;danielagquinn&#x2F;status&#x2F;703629488932970500</a>).<p>Many heated debates ensued. Even then, the main critique of Paperless remained technical, but lack of maturity and implemenation was described by one Reddit users as: &quot;I&#x27;ve looked into paperless and it currently lacks a lot of...nearly well everything. Maybe in a year or two&quot; (<a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;linux&#x2F;comments&#x2F;6m9evn&#x2F;want_to_go_paperless_looking_for_dms&#x2F;dk1cjz0&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;linux&#x2F;comments&#x2F;6m9evn&#x2F;want_to_go_pa...</a>)<p>On April 9, 2016, Daniel added a reference to Mayan to the documentation of Paperless (<a href="https:&#x2F;&#x2F;github.com&#x2F;danielquinn&#x2F;paperless&#x2F;commit&#x2F;674d54ec38783b02350c1371bdf0f412dd765ef0#diff-88b99bb28683bd5b7e3a204826ead112" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;danielquinn&#x2F;paperless&#x2F;commit&#x2F;674d54ec3878...</a>).<p>On April 17, 2016, Daniel posted on his old twitter account: &quot;It looks like my idea for Paperless wasn&#x27;t all that unique. This other project uses a lot of the same tools: <a href="http:&#x2F;&#x2F;www.mayan-edms.com&quot;" rel="nofollow">http:&#x2F;&#x2F;www.mayan-edms.com&quot;</a> (<a href="https:&#x2F;&#x2F;twitter.com&#x2F;danielagquinn&#x2F;status&#x2F;721726208606646272" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;danielagquinn&#x2F;status&#x2F;721726208606646272</a>).<p>On April 14, 2017, Daniel Quinn posted in his blog a summary of his experiences at DjangoCon Europe 2017 where he mentions meeting Roberto in person. He describes Roberto as a &quot;rival geek&quot; in what appears to be jest and uses positive adjectives to describe Roberto in the rest of the post. (<a href="https:&#x2F;&#x2F;danielquinn.org&#x2F;blog&#x2F;djangocon-2017&#x2F;" rel="nofollow">https:&#x2F;&#x2F;danielquinn.org&#x2F;blog&#x2F;djangocon-2017&#x2F;</a>)<p>On April 16, 2017 Daniel posted a tweet mentioning the popularity Paperless (<a href="https:&#x2F;&#x2F;twitter.com&#x2F;danielagquinn&#x2F;status&#x2F;853701257051205632" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;danielagquinn&#x2F;status&#x2F;853701257051205632</a>).<p>The last release of Paperless is made on Sep 9, 2017.<p>On Oct 18, 2017 Daniel posted: &quot;I changed my Twitter name! This isn&#x27;t me any more, so if you&#x27;re looking for me, you should keep head over to @danielagquinn.&quot; (<a href="https:&#x2F;&#x2F;twitter.com&#x2F;searchingfortao&#x2F;status&#x2F;920778623715610624" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;searchingfortao&#x2F;status&#x2F;92077862371561062...</a>). Only 7 commits have been made to Paperless since with the last commit happening on Novermber 5, 2017.<p>On December 18, 2017 a user named &quot;zhoubear&quot; anounced on Reddit&#x27;s selfhoted &quot;Open Paperless: Scan, index, and archive all of your paper documents&quot; (<a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;selfhosted&#x2F;comments&#x2F;7kjocg&#x2F;scan_index_and_archive_all_of_your_paper_documents&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;selfhosted&#x2F;comments&#x2F;7kjocg&#x2F;scan_ind...</a>). It turned out that Open Paperless was a forked Mayan EDMS with cosmetic changes but with copyrights changed and no attribution to Mayan EDMS. After a much heated debate, copyrights and attributions were restored and the project&#x27;s description has been updated to show that it is a new front end for Mayan among other usability changes meant for home users.<p>In 4 days, Open Paperless has surpassed Mayan EDMS in popularity on Github.<p>No posts or comments from Roberto can be found in reference of Paperless or Open Paperless.<p><a href="https:&#x2F;&#x2F;twitter.com&#x2F;search?q=paperless%20from%3Asearchingfortao&amp;src=typd" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;search?q=paperless%20from%3Asearchingfor...</a>
评论 #15993478 未加载
ikaweover 7 years ago
Let&#x27;s put this in a room with [The Screenless Office](<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=15960056" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=15960056</a>) and see what happens.
评论 #15985203 未加载
SomewhatLikelyover 7 years ago
Something I&#x27;ve wanted that might be possible is software that takes in a video of me flipping the pages of a notebook and converts that to a PDF of the notebook.
评论 #15991869 未加载
bob_theslob646over 7 years ago
Please correct me if I am wrong, but this looks like you have to &quot;name&quot; each page. I would also want to see how accurate the ocr is. Historically, ocr on handwritting has been a problem unless the data is perfectly formatted. I guess the case is just to get enough accuracy so that you can look for or at the image of that page with the indexed search term you were looking for.
mickael-kerjeanover 7 years ago
Well done! Will definitly give it a try back home !
mauritzioover 7 years ago
Maybe it would be better to &quot;archive&quot; on good paper (encoded) Can not imagine a 1000 year old magnetic device... ;)
gravypodover 7 years ago
Will this automatically center and apply perspective transforms to pictures taken with phone cameras?
评论 #15987055 未加载
评论 #15984577 未加载
评论 #15988408 未加载
评论 #15984660 未加载
rootsudoover 7 years ago
Okay, wow, this is cool.
EGregover 7 years ago
I have a question<p>Is there a service anyone knows about which will print your email and send it with tracking of receipt or signature, so you can prove what was physically sent?<p>Or you mail it to them and they open your mail, scan it and forward it on with signature required, with your address as the return address?<p>Because righy now you can only prove that the ENVELOPE was received, not what was in it.
评论 #15985319 未加载
评论 #15985465 未加载
评论 #15984612 未加载