I went through a similar journey back when GPT-4V came out. Here's an additional puzzle for you: GPT-4V knows the <i>exact</i> pixel dimensions of the image (post-resize since there is a max size for images in the pipeline, besides 512x512), but I'm 99% sure it's not provided as text tokens. How am I so sure? It's easy to get GPT to divulge everything from system prompt to tool details, etc. but I've tried every trick in the book and then some, multiple times over, and there is no way to get it to quote the dimensions as text. The only way to get it to give you the dimensions is to tell it to output a structure that contains width and height and just pick something reasonable, and they will "randomly" be the correct values:<p><a href="https://x.com/blixt/status/1722298733470024076" rel="nofollow">https://x.com/blixt/status/1722298733470024076</a>