The recent progress in multi-modality retrieval is to augment image search with text instruction, so called Composite Image Retrieval (CIR). Related research includes <a href="https://huggingface.co/BAAI/bge-visualized" rel="nofollow">https://huggingface.co/BAAI/bge-visualized</a> and Google's <a href="https://open-vision-language.github.io/MagicLens/" rel="nofollow">https://open-vision-language.github.io/MagicLens/</a>. We built a live demo for people to play with it and get a feeling of the quality of this retrieval approach.