TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

I made ChatGPT take an SAT Test

31 点作者 sprague超过 2 年前

6 条评论

tkgally超过 2 年前
I’ve been experimenting with ChatGPT in another educational context: short essays that high school and college students often have to write for class. ChatGPT excels. I am putting my test results on the following page:<p><a href="https:&#x2F;&#x2F;www.gally.net&#x2F;temp&#x2F;202212chatgpt&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;www.gally.net&#x2F;temp&#x2F;202212chatgpt&#x2F;index.html</a><p>Some of the prompts and responses involve Japanese because I teach at a university in Japan. I especially want to see how well Japanese students could use ChatGPT to produce compositions for their English academic writing classes. Of the trials I’ve done so far, Test 18 on the above page is the most significant: The Japanese prompt outlines briefly a three-paragraph essay, and ChatGPT produces two very good original essays in English on the requested topic and with the requested organization.
评论 #33843382 未加载
touringa超过 2 年前
In 2020, GPT-3 175B was already outperforming humans on SAT questions by ~20%. There are far more formal benchmarks for models now.<p><a href="https:&#x2F;&#x2F;lifearchitect.ai&#x2F;iq-testing-ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lifearchitect.ai&#x2F;iq-testing-ai&#x2F;</a>
评论 #33841573 未加载
sinuhe69超过 2 年前
Japan did something similar years ago with result in the top 20% for the national college entrance exam. The Todai robot could even take the exam all by itself and didn’t need a human to feed it questions and possible answers.<p><a href="https:&#x2F;&#x2F;www.businessinsider.com&#x2F;robot-beat-most-students-on-university-tokyo-entrance-exam-2017-9" rel="nofollow">https:&#x2F;&#x2F;www.businessinsider.com&#x2F;robot-beat-most-students-on-...</a>
elashri超过 2 年前
I would be really interested to see how it performs on GRE (The stupidest exam ever invented). I think it will do great on writing and qualitative but I doubt about quantitative part.<p>Edit: I actually tried quick trial to let ChatGPT take one quantitative practice exam (16 questions) using the same way the OP feed the questions. I was surprised that the result was 5&#x2F;16 correct answers. This was even worse than I guessed. Indeed, this is just heuristic attempt and maybe someone (or me) will have more serious one later. If you are interested into what were the question and answers that, They are here [1]<p>i.e to how I fed the question was<p>&gt; If x+2y= 4 and x+3y= 9. Find the correct option. Quantity A = x Quantity B = y<p>choices: Quantity A is greater Quantity B is greater Both Quantities are equal The relationship cannot be determined<p>And the answer was<p>&gt; The correct answer is 2) Quantity B is greater.<p>To solve this problem, we can first note that the equations x+2y=4 and x+3y=9 can be solved simultaneously to find the values of x and y.<p>and it went to explain the steps.<p>[1] <a href="https:&#x2F;&#x2F;paste.elashri.xyz&#x2F;s14t" rel="nofollow">https:&#x2F;&#x2F;paste.elashri.xyz&#x2F;s14t</a>
评论 #33841565 未加载
ilaksh超过 2 年前
What&#x27;s missing from these models is everything related to visual or spatial information (that is not encoded in text). I assume that there will be eventually be something like ChatGPT&#x2F;InstructGPT where part of the input data is images and or videos, with and without captions. So it would have a way of connecting the language to the spatial (and temporal).<p>It seems like they may need a more efficient approach though to handle the massive amount of video data. Maybe the &#x27;MrsFormer&#x27; multi-resolution thing could help.<p>Another thing that could be very useful for coding without requiring visual information would be to add a whole other subsystem where this thing could actually compile&#x2F;run the code iteratively and see the output.<p>I don&#x27;t think transformers are the last invention in AI, but they certainly seem capable of getting to general purpose AI for many contexts. That and related techniques are not going to create something like a digital autonomous person though, which I think is a good thing.
评论 #33846096 未加载
runsWphotons超过 2 年前
what is the average SAT score if you give the test to three or four randomly selected people?
评论 #33841570 未加载