They released the data for this report as a bunch of CSV files in a Google Drive, so I converted those into a SQLite database for exploration with Datasette Lite: <a href="https://lite.datasette.io/?url=https://static.simonwillison.net/static/cors-allow/2025/ai-index-report-2025.db#/ai-index-report-2025" rel="nofollow">https://lite.datasette.io/?url=https://static.simonwillison....</a><p>Here's the most interesting table, illustrating examples of bias in different models <a href="https://lite.datasette.io/?url=https://static.simonwillison.net/static/cors-allow/2025/ai-index-report-2025.db#/ai-index-report-2025/3~2E+Responsible+AI~2FData~2Ffig_3~2E7~2E4?_facet=category&_facet=domain&_facet=llm&_facet=variation&_facet=flag&flag=1" rel="nofollow">https://lite.datasette.io/?url=https://static.simonwillison....</a>
Regarding point number 11 (AlphaFold3 vs Vina, Gnina, etc.), see my rebuttal here (I'm the author of Vina): <a href="https://olegtrott.substack.com/p/are-alphafolds-new-results-a-miracle" rel="nofollow">https://olegtrott.substack.com/p/are-alphafolds-new-results-...</a><p>Gnina is Vina with its results re-scored by a NN, so the exact same concerns apply.<p>I'm very optimistic about AI, for the record. It's just that in this particular case, the comparison was flawed. It's the old regurgitation vs generalization confusion: We need a method that generalizes to completely novel drug candidates, but the evaluation was done on a dataset that tends to be repetitive.
I always see these reports about how much better AI is than humans now, but I can't even get it to help me with pretty mundane problem solving. Yesterday I gave Claude a file with a few hundred lines of code, what the input should be, and told it where the problem was. I tried until I ran out of credits and it still could not work backwards to tell me where things were going wrong. In the end I just did it myself and it turned out to be a pretty obvious problem.<p>The strange part with these LLMs is that they get weirdly hung up on things. I try to direct them away from a certain type of output and somehow they keep going back to it. It's like the same problem I have with Google where if I try to modify my search to be more specific, it just ignores what it doesn't like about my query and gives me the same output.
Surprised not to see a whole chapter on the environment impact. It's quite a big talking point around here (Europe, France) to discredit AI usage, along with the usual ethics issues about art theft, job destruction, making it easier to generate disinformation and working conditions of AI trainers in low-income countries.<p>(Disclaimer: I am not an anti-AI guy — I am just listing the common talking points I see in my feeds.)
Note that this is an overview, each chapter has its own page, and even those are overviews, each chapter comes as a separate PDF.<p>The full report PDF is 456 pages.
"AI performance on demanding benchmarks continues to improve."<p>My feeling is that more AI models are fine-tuned on these prestigious benchmarks.
Meta question, why does the website try to make it more difficult to open the images in a new tab? usually if I want to do that, I right click and then select "open image in a new tab". Here I had to go through some loops to do it. Additionally, if you just copy the URL you get to a image that's just noise and that seems to be by design. I still can access the original image though and download it from AWS S3 (<a href="https://hai-production.s3.amazonaws.com/images/fig_1e.png" rel="nofollow">https://hai-production.s3.amazonaws.com/images/fig_1e.png</a>). So the question, why all the loops, just to scare off non-technical users?
I recall Stanford's past AI Reports being substantial and critical some years ago. This seems like a compilation of many small press releases into one large press release ("Key take away: AI continues to get bigger, better and faster"). The problem is that AI went from universities to companies and the publications of the various companies themselves then went from research papers to press releases/white papers (I remember OpenAI's supposed technical specification of GPT-something as a watershed, in that actually involved no useful information but just touted statistics who context the reader didn't know).
What I'm certain of is that the standard of living will increase. Because we can do more effective work in the same time. This means more output and things will become cheaper. What I'm not sure of, is where this effect will show in the stock market.
> The U.S. still leads in producing top AI models—but China is closing the performance gap.<p>Most researchers that I know do not think about things in this lens. They think about building cool things with smart people, and if those people happen to be Chinese or French or Canadian it doesn’t matter.<p>Most people do not want a war (hot or cold) with the world’s only manufacturing superpower. It feels like we have been incepted into thinking it’s inevitable. It’s not.<p>In the other hand, if in some nationalistic AI race with China the US decides to get serious about R&D on this front, it will be good for me. I don’t want it though.
> In the U.S., 81% of K–12 CS teachers say AI should be part of foundational CS education, but less than half feel equipped to teach it.<p>I'm curious, what exactly do they mean when they say they should teach AI in K-12?