We have built BindDiffusion, one diffusion model to bind multi-modal embeddings. It leverages a pre-trained diffusion model to consume conditions from diverse or even mixed modalities. This design allows many novel applications, such as audio-to-image, without any additional training. This repo is still under development. Please stay tuned!