Motivation: In the post [1], GPT-5 is behind schedule (wsj.com), user jacobolus posted a comment [2] about LLM are not useful for tricky or obscure questions.<p>I am thinking about creating a page for testing those claims. Perhaps something similar already exists. Anyway, I think that page could be useful to determine whether, in general, LLM are useful for deep questions.<p>Obviously, that page should use an ensemble of the best models and there should be limits to the number of models, time and budget for computation. That costs real money.<p>I think the battle between editor and contributors to wikipedia and LLMs is going to be fierce once the LLMs get to the level to question the basic assumptions of editors in their respective fields.<p>Edited: Edited a lot.<p>[1] GPT-5 is behind schedule (wsj.com)
https://www.wsj.com/tech/ai/openai-gpt5-orion-delays-639e7693<p>[2] Excerpt: I've never gotten an answer from an LLM to a tricky or obscure question about a subject I already know anything about that seemed remotely competent.
Sure, we can create a website that says "AI is useful for complex things", but will it actually make it true? People say that it is only usable for trivial stuff due to their experiences - all AI tools fumble at most marginally complex tasks and questions. Change the experience of the people, and their opinion will change too.