TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What are your go-to "test" questions when evaluating a new LLM?

7 点作者 johntiger1超过 1 年前
Do you have a go-to question (or several) to check if an LLM knows its stuff? For me, I ask a simple question:<p>&quot;What is Operation Konrad III&quot;<p>which most LLMs fail due to the (relative) obscurity of the event.

3 条评论

philippta超过 1 年前
Not really scientific or anything but I tend to give it the task: „Write a simple http server in Go that saves all requests into a SQLite database.“<p>What I am looking for is:<p>- did it forget to import the SQLite driver?<p>- is it doing weird SQL shenanigans like selecting MAX(id) to obtain the next potential id?<p>- is the code rather simple or over-engineered?<p>update: Most LLMs produce a decent answer, however it you increase the difficulty a little bit by asking it &quot;Write a simple and CGo free http server in Go ...&quot;, most LLMs get the sql driver wrong (except for gpt-4-1106-preview)
muzani超过 1 年前
I give it a large block of code and see if it can find the bug. Amusingly, GPT sometimes passes it with flying colors (finding the bugs I didn&#x27;t see and seeing unused imports) but at other times it just flat out fails to see anything.
mejutoco超过 1 年前
I ask it about creating a conversation in Polish with English translations about an encounter between two neighbours walking their dogs.