I store most digitised documents as .PDFs.<p>Very often you can obtain original .PDFs from companies by downloading from websites, as well as (or instead of) the paper documentation they send you.<p>For local scanning, I use a HP MFP. If I need to scan individual pages, I can then merge those, if necessary, with a 'merge.pdf' type of software utility.<p>Store the scanned/downloaded documents in some type of tree-structured directory format. This greatly reduces the time taken to find a specific document.<p>I keep financial documents separate from other documents. Financial documents are also segregated into separate tax-year 'trees'.<p>Documents are backed-up month by month, and also daily. The monthly back-ups are stored indefinitely, and separately from the daily back-ups which are deleted in reverse chronological 'exponential' order.<p>Daily-backups remaining at the moment. Day 0000 was back on 23rd June 2012. Last word is server name. Note how there are more recent backups than earlier backups:<p><pre><code> 0000-120623nullius
1024-150401nullius
2048-180131centrepoint
2304-181014centrepoint
2560-190627centrepoint
2688-191102centrepoint
2720-191204centrepoint
2736-191220centrepoint
2752-200105centrepoint
2756-200109centrepoint
2758-200111centrepoint
2759-200112centrepoint
2760-200113centrepoint</code></pre>
I've got a decent brother scanner like so <a href="https://www.ebay.com/p/13030519316" rel="nofollow">https://www.ebay.com/p/13030519316</a>, when I scan a document it ends up on a folder from my NAS.<p>I've built a small webapp that reads the content of this folder as untagged documents. Tagging them will move them to a proper folder and the docs will finally be visible in a treeview.<p>It is relatively robust and low maintenance. I might at some point work on download + OCR scripts to get and auto-tag bills and such that are already in PDF. Not sure if it is really useful to be honest at this point
My method was more specific to bills and finance documents. I used a generic photo scanner. It's not as automatic as the purpose-built document scanners that have automatic feeders and support multiple pages, but I wanted something that I could use for photography as well.<p>I coupled this with some very hacked together Perl scripts with Tesseract OCR[1] that fed in data to ledger-cli[2] for handling bills. I put other generic documents into folders by date.<p>It worked pretty well, and I was able to generate some pretty graphs from data that was fully reconciled with financial institutions like my bank, credit card, investments, etc., but still took too much time. So what do I do now? Nothing!<p>This was years ago. I assume there is now better support from financial institutions for extracting data and this coupled with improved OCR/machine learning might make things more robust and make it worthwhile to try again.<p>[1] <a href="https://en.wikipedia.org/wiki/Tesseract_(software)" rel="nofollow">https://en.wikipedia.org/wiki/Tesseract_(software)</a><p>[2] <a href="https://www.ledger-cli.org/" rel="nofollow">https://www.ledger-cli.org/</a>
What’s your goal? I haven’t received a paper bill in years. They are already digitized. Same for most news/magazine articles. Aside from older/historical documents, nearly every piece of paper I encounter has a digital counterpart that I can access in some form.
with bills the quality is secondary, and indexing is more important. I scan using Microsoft Office Lens and email to myself adding a few keywords in the title "Electricity bill for November 2020"