TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

GPT4 gets a 0 on Steven Landsbrug's undergrad econ exam

50 点作者 DantesKite大约 2 年前

7 条评论

maister大约 2 年前
Anyone had a look at those questions?<p>&gt; Question 2: In a country where everyone is identical, 100 people wait in line each day to buy raspberries at a controlled price. The government has decided to hand out free coffee to the people standing in line. The coffee costs the government $1 per cup, but the people in line value that coffee at only 75 cents per cup. What is the social cost of providing the coffee?<p>&gt; Grading Remarks: This answer completely misses the key fact that free coffee will cause the line to get longer (in fact it must cause the line to get longer, given the stated assumption that everyone is identical, hence initially indifferent between standing in line and not standing in line). In fact (unless one assumes a very small population), the line must grow until the extra waiting time completely dissipates the value of the free coffee; thus the social cost of providing the coffee is $100.<p>Wait what? So the line must get longer, *given the stated assumption that everyone is identical*. Well what about the given stated assumption *that 100 people wait in line each day*? One of the stated assumptions can be dropped just like that, while the other is treated like a law of nature? Also where is the logic in &quot;everyone is identical, hence initially indifferent between standing in line and not standing in line&quot;? Am I missing something?
评论 #35551386 未加载
评论 #35546902 未加载
评论 #35546853 未加载
评论 #35557733 未加载
评论 #35557606 未加载
评论 #35593996 未加载
sverona大约 2 年前
Honestly this doesn&#x27;t surprise me. I don&#x27;t have GPT-4 access, but I fed ChatGPT a simple calculus question as a sanity check.<p>Question: Calculate the integral of x arctan x dx. Show your work.<p>Answer: We can solve this integral using integration by parts:<p>Let u = arctan x and dv = x dx, then we have:<p>du&#x2F;dx = 1&#x2F;(1+x^2) v = (1&#x2F;2)x^2<p>Using the integration by parts formula, we get:<p>∫ x arctan x dx = uv - ∫ v du&#x2F;dx dx = (1&#x2F;2)x^2 arctan x - (1&#x2F;2)∫ x^2&#x2F;(1+x^2) dx<p>To solve the second integral, we can use substitution. Letting u = 1+x^2, then du&#x2F;dx = 2x, and dx = du&#x2F;(2x). Substituting into the integral, we get:<p>∫ x^2&#x2F;(1+x^2) dx = (1&#x2F;2) ∫ du&#x2F;u = (1&#x2F;2) ln|u| + C = (1&#x2F;2) ln|1+x^2| + C<p>Putting it all together, we get:<p>∫ x arctan x dx = (1&#x2F;2)x^2 arctan x - (1&#x2F;2) [(1&#x2F;2) ln|1+x^2|] + C = (1&#x2F;2)x^2 arctan x - (1&#x2F;4) ln|1+x^2| + C<p>Therefore, the integral of x arctan x dx is (1&#x2F;2)x^2 arctan x - (1&#x2F;4) ln|1+x^2| + C, where C is the constant of integration.<p>---<p>This is wrong, although it gets pretty damn close (see if you can spot the mistake.) I was an exceptionally kind grader when I taught and Landsburg seems like an exceptionally picky one. But either way ChatGPT wouldn&#x27;t pass calculus 2, either.
评论 #35545497 未加载
评论 #35545579 未加载
评论 #35545571 未加载
SilasX大约 2 年前
This should really link directly to Landsburg’s post rather a Twitter promotion of it:<p><a href="https:&#x2F;&#x2F;www.thebigquestions.com&#x2F;2023&#x2F;04&#x2F;05&#x2F;gpt-4-fails-economics&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.thebigquestions.com&#x2F;2023&#x2F;04&#x2F;05&#x2F;gpt-4-fails-econo...</a><p>Edit: Also, the post says GPT-4 scored 4&#x2F;90, a little better than 0.
评论 #35545621 未加载
SilverBirch大约 2 年前
I&#x27;m still extremely skeptical of testing LLMs on <i>any</i> of type of these tests. The likelihood of the model being trained on the answers to the test is too high. For example, with the bar exam that people go on about - it seems to me that the internet is going to be absolutely chocked full of sample questions and answers for the bar exam. Anyone with any knowledge at all in AI knows that one of the worst things you can do is evaluate your AI using data in the training set. Of course it&#x27;s going to do a good job! It&#x27;s telling you what you told it! Why not just put down a SQL database and stick all the answers in it (I&#x27;m being facetious, but you get my point). We know from other research papers that people have managed to prompt the generative AI image generators to reproduce copyrighted pictures in a way that makes it very clear they were just very highly trained on that during training. It&#x27;s the same with this, answers to standardized tests are always going to appear very commonly in your training data. The point of the standardized test works okay as a spot check for humans who have studied the course, but extremely badly for AI that hasn&#x27;t studied the course but instead has just hoovered up every token in the course.
评论 #35554989 未加载
Ancalagon大约 2 年前
Hard to contrast this with the report of gpt-4 getting a B in quantum computing yesterday
评论 #35546973 未加载
stuff4ben大约 2 年前
I&#x27;d expect most economists couldn&#x27;t do a good job either. It&#x27;s black-box magic, suitable for wizards.
评论 #35544898 未加载
hackrnusr大约 2 年前
Because it couldn&#x27;t explain corporate executive compensation levels?
评论 #35571586 未加载