I think a lot of people are sleeping on Xai for two reasons - Twitter & Tesla FSD data.<p>I've seen numerous talks recently by Ai leaders discussing how the next level of LLMs have to "understand the physical world" (which tesla FSD is a leader in, see Elon's response to Sora - <a href="https://www.news18.com/tech/elon-musk-claims-tesla-has-been-generating-openais-sora-like-videos-for-a-year-8784187.html" rel="nofollow">https://www.news18.com/tech/elon-musk-claims-tesla-has-been-...</a>), along with Twitter being the "town square" of the internet giving them an immense advantage. I wouldn't be surprised if the Twitter API was completely cut off one day.
Interesting that they built their own benchmark and that it primarily features images from vehicles. Tesla overlap?<p>> <i>... we are introducing a new benchmark, RealWorldQA. This benchmark is designed to evaluate basic real-world spatial understanding capabilities of multimodal models. While many of the examples in the current benchmark are relatively easy for humans, they often pose a challenge for frontier models.</i><p>> <i>The initial release of the RealWorldQA consists of over 700 images, with a question and easily verifiable answer for each image. The dataset consists of anonymized images taken from</i> vehicles, <i>in addition to other real-world images ... RealWorldQA is released under CC BY-ND 4.0.</i><p>Will be interesting to see the feedback once someone has a change to look into the dataset (<a href="https://data.x.ai/realworldqa.zip" rel="nofollow">https://data.x.ai/realworldqa.zip</a>).<p>Side note — I'm very impressed with their "Explaining a meme" example.