Betteridge's Law of Headlines strikes again. (Well, Hacker News' abbreviated headlines, in this case.)<p>"Professors Staffed a Fake Company with AI Agents. Guess What Happened?"
"No."<p>The original headline is "Professors Staffed a Fake Company Entirely With AI Agents, and You'll Never Guess What Happened"; the answer is... uh... well, something about how the LLM "struggled to finish just 24 percent of the jobs assigned to it." However, since they <i>also</i> reportedly had an LLM "writing performance reviews for software engineers based on collected feedback," in a just world that 24% "completion" rate would have been computed by another LLM.<p>Clicking through, it looks like the actual "researchers" are here:<p><a href="https://the-agent-company.com/" rel="nofollow">https://the-agent-company.com/</a><p>And their project is here:<p><a href="https://github.com/TheAgentCompany/TheAgentCompany/blob/main/docs/EVALUATION.md">https://github.com/TheAgentCompany/TheAgentCompany/blob/main...</a><p>Which (at first glance) looks like a plain old task-based benchmark, i.e. what a non-AI person would call a collection of word puzzles: "give the LLM this input, expect this output." These word puzzles are themed around office jobs. Here's an example input:<p><a href="https://github.com/TheAgentCompany/TheAgentCompany/blob/main/workspaces/tasks/admin-get-best-vendor-quote/task.md">https://github.com/TheAgentCompany/TheAgentCompany/blob/main...</a>