This is one of my projects using OpenAI's CLIP model to do interesting things. So the big credit goes to them!<p>I processed 2M images from the Unsplash dataset with CLIP and stored the feature vector representation of each photo (a 512 element vector). You can now encode a text query with CLIP in the same latent space and search the database.<p>I also tried doing some simple arithmetic with the feature vectors to combine the result of a search query and a photo. You can for example search for "Sydney Opera house" and give it a bight photo and you will get a photo of the Sydney Opera at night.<p>You can directly jump to the Google Colab notebook if you want to give it a try: <a href="https://colab.research.google.com/github/haltakov/natural-language-image-search/blob/main/colab/unsplash-image-search.ipynb" rel="nofollow">https://colab.research.google.com/github/haltakov/natural-la...</a>