OpenAI released their new o3-mini model on Friday. Its knowledge cutoff is October, 2023(!). What does that mean exactly?<p>Large Language Models are trained on huge amounts of text data. In other words, they have read the entire Internet. Knowledge cutoff means that o3-mini has read the entire Internet - as of October, 2023 (
https://platform.openai.com/docs/models#o3-mini). It doesn't know of anything that has happened over the last 15 months. Anthropic's most up to date model has a cutoff date of July 2024 (https://docs.anthropic.com/en/docs/about-claude/models#model-comparison-table).<p>In a Reddit Ask-Me-Anything session, the OpenAI team have confirmed they are working on updating knowledge cutoffs. But Sam Altman himself replied that in his own work he never thinks about the knowledge cutoff anymore - thanks to ChatGPT's Web Search feature. To which one Reddit user left a comment: "But search results are meh compared to stuff from its own knowledge". (https://www.reddit.com/r/OpenAI/comments/1ieonxv/comment/ma9zhc2/)<p>Indeed, when talking to ChatGPT you will notice that it spontaneously decides to search the web when answering your questions. It then uses top search results as a basis for its answer - instead of its own knowledge which might be outdated and incorrect.<p>Does that mean that ChatGPT is virtually reduced to reading out to us top Google results? Have we lost touch with the breathtaking breadth of understanding it has gained when "reading the entire Internet"?<p>These limitations must be taken into account when doing research on products, shopping, news, industries, companies, policy, etc. Web search based answers will be accurate but their context will be limited to top web search results. Without web search, we're exposed to the full breadth of understanding - at the cost of information being out of date.<p>Bonus question: Can OpenAI, Anthropic and other LLM providers update knowledge cutoffs more frequently?<p>My personal bet is that it is surprisingly risky and time-consuming to "add a few more months of data and retrain the model" because of all the "final touches" that happen after training has finished. As in any software project - the last 5% takes half the time. :)<p>Original post: https://www.linkedin.com/feed/update/urn:li:activity:7292436106036322306/