I have a few questions on how to implement a demo for most up-to-date image to image search. It will do single real-time queries and search over a very large database (millions of images).<p>Below is a breakdown of what I got planned so far and I am looking for your feedback and recommendations.<p>Models:
I am considering dinov2-base or SigLIP-S even OpenCLIP ViT-B/32.<p>Storage and indexing:
Probably Qdrant (self-host), would consider FAISS too if large memory was not a requirement.<p>Input problems:
I have images of various sizes and aspect ratios and all are fairly large (no thumbs). Which preprocessing would you recommend me? Cropping a square in the center, resizing to a square and ruining proportions, padding to a square and resizing? I am worried that padding will impact accuracy of the search.<p>Deployment:
I'll do the embedding calculations on my local machine but would like to hear suggestions for price-efficient online hosting of the inference model.<p>Thank you.