TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Best document storage solution in 2023

15 pointsby ID1452319about 2 years ago
We have a application which requires a document store. This need to hold up to 10 million documents and be accessible via APIs to retrieve documents to display in our application.<p>We are considering everything from Dropbox-type solutions to blob storage in GCP.<p>What kind of document storage solutions are people using in 2023 to meet this use case?

7 comments

tothrowawayabout 2 years ago
I use B2 and Wasabi because I don&#x27;t like relying on a single cloud provider. Files are uploaded to both. OpenResty (Nginx+Lua) sits in front to provide caching, and the logic for deciding which provider to pull from.<p>Wasabi gives you a free bandwidth allowance equal to the number of bytes stored per month. When I use up most of that, I start pulling from B2. And of course, if one of them is down, I pull from the other.<p>It&#x27;s more time up front to build instead of just relying completely on GCP&#x2F;Azure&#x2F;AWS. But I don&#x27;t have to worry as much about spontaneous account terminations destroying my business.
s1k3sabout 2 years ago
It&#x27;s too much of a generic question to be answered right. Do you need global availability? Do you need high speed downloads? Are you worried about bandwidth costs? etc.<p>We use S3 + Cloudfront for documents that we want to be quickly accessed by our customers. We use SFTP for our internal docs when we don&#x27;t care that much about availability and speed.
评论 #35081392 未加载
评论 #35083690 未加载
speedgooseabout 2 years ago
I would go with an S3 compatible object store by default.<p>In Open-source Ceph and Minio are common. Garage is newer and has good potential too and it has a simpler design.<p><a href="https:&#x2F;&#x2F;ceph.com&#x2F;en&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ceph.com&#x2F;en&#x2F;</a> <a href="https:&#x2F;&#x2F;min.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;min.io&#x2F;</a> <a href="https:&#x2F;&#x2F;garagehq.deuxfleurs.fr&#x2F;" rel="nofollow">https:&#x2F;&#x2F;garagehq.deuxfleurs.fr&#x2F;</a>
fpdavisabout 2 years ago
The file system was designed to hold documents and does a pretty good job of it, there are several to choose from depending on what OS you run. Backing them up and restoring them is easy. An API to retrieve documents is trivial to write and customize or there are a few tools and APIs already available.
giaourabout 2 years ago
There are a number of fine options for blob storage (S3, R2, Ceph, Azure Storage, etc.), but with that many documents it&#x27;s likely access control and audit logging will be important. If that&#x27;s the case, something heavyweight like SharePoint may be a better choice.
评论 #35084812 未加载
locustmostestabout 2 years ago
One possibility is to use our open-core document management API build to deploy in your AWS account: <a href="https:&#x2F;&#x2F;github.com&#x2F;formkiq&#x2F;formkiq-core">https:&#x2F;&#x2F;github.com&#x2F;formkiq&#x2F;formkiq-core</a><p>The files are stored in S3, with customizable metadata storage in DynamoDB. As the system is designed to run on AWS Serverless and Managed Services, the majority of the cost will come from S3 storage fees.
LLcolDabout 2 years ago
Documents need to be indexed?
评论 #35081657 未加载