Is it advisable to store images directly in PostgreSQL for a dataset of 100 million records, each with a 200KB image, or should I use object storage with references from the start? My primary and only use case involves creating multimodal embeddings for search and relevance purposes.
If you are storing 100 million images at 200KB each, that comes out at 20TB!<p>I would calculate the costs of something like S3 versus buying five 4TB HDDs and running a network file server.<p>You're going to save a ton of money hosting this yourself. I would go with two used powerful desktop PCs. One as a DB server and the other as the file server.<p>Store the images on the file server and store the image's path in the database server.
You could do quick tests using bytea, toast, or large objects.<p>But an object store may be more convenient overall.<p>When I did something similar, I did store embeddings and the image UUID in a table and my images in an object store with the same UUIDs as filenames. It was simpler to upload the images and put them available through a CDN.