Downloaded params.json - GeLU & 2D RoPE are used for the vision adapter. The vocab size also got larger - 131072 in size.<p>Also Mistral's latest tokenizer PR shows 3 extra new tokens (the image, the start & end).<p>The torrent is 24GB, and I guess the implementation will be up in HF in a few days!<p>Exciting times!