When I was young, and learning math, my father always forbade me from looking at the answer in the back of the textbook. “You don’t work backwards from the answer!”, and I think this is right.<p>In life, we rarely have the answer in front of us, we have to work that out from the things we know. It’s this struggling that builds a muscle you can then apply to any problem. ChatGPT, I suspect, is akin to looking up the answer. You’re failing to exercise the muscle needed to solve novel (to you), problems.
These comments are filled with misunderstandings of the result. There were three groups of kids:<p>1. Control, with no LLM assistance at any time.<p>2. "GPT Base", raw ChatGPT as provided by OpenAI.<p>3. "GPT Tutor", improved by the researchers to provide hints rather than complete answers and to make fewer mistakes on their specific problems.<p>On study problem sets ("as a study assistant"), kids with access to either GPT did better than control.<p>When GPT access was subsequently removed from all participants ("on tests"), the kids who studied with "GPT Base" did worse than control. The kids with "GPT Tutor" were statistically indistinguishable from control.
Used incorrectly. Yes.<p>LLMs, for me, have been tremendously useful in learning new concepts. I frequently feed it my own notes and ask it to correct any misunderstandings, or to expand on things I don’t understand.<p>I use it like I would an on demand tutor, but I can totally understand how it could be used as a shortcut that wouldn’t be helpful.<p>In the same way, I can hire a tutor that will help me actually learn, or I can hire a “tutor” that just does the homework for me. I’ve worked as a tutor so I’ve seen people looking for both, and people that don’t want to learn are always going to find a way. People who do want to learn are also going to find a way.
From the abstract:<p>“Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor). However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base).”<p>Kids who use ChatGPT do actually “significantly” better according to the authors. Now I don’t know if significantly means statistically significant here because I haven’t read the methodology but 127% increase in performance must be something. That said, that’s a clickbaity title if I’ve ever seen one.<p>Edit: Upon closer reading, the increase in performance is statistically significant. Also “access to GPT“ in this case is having GPT open while solving the problems, not studying with GPT and then solving the problems, which was my first understanding from the clickbaity title. Results are not terribly surprising in that regard.
Story time. I always struggled with math as a kid. School to high school, then didn't touch it much until Uni. Teachers typically couldn't explain things in a way I "got it" in a school setting. I had some success with a private tutor to get me over the line in high school.<p>Then at Uni I'm doing Computer Graphics, which included advanced (for me) math. I was panicked, and initially struggled until one of my good friends who was also studying the same course, and is VERY good at math, was able to answer my vague "I don't get it" questions, or at least guide me to more specific questions.<p>I think I'm quite a visual learner, I don't think at that time there was a concept of people learning "differently". Luckily my good friend was also a visual learner, along with also being very good at math. It was like someone was able to see how my brain worked and feed me information in a way it could compile. I became quite good at math after that.<p>You really need to learn how to learn. Its fascinating, but also horrifying when I now consider all the lives that have been negatively impacted because this wasn't understood, and people were led to believe they couldn't do something which maybe then really wanted to be able to do.<p>If GenAI can help with that, I'm all in.
If I lived before the tape measure was invented, and rely on carefully placing my metersticks to measure things, I can get really good at measuring without the need for a measuring tape. After all, a measuring tape is just a few flexible metersticks anyways, so if you need to measure something longer than the full length of the tape, you are screwed.<p>If you take the measuring tape away from the person who relied on that tool instead of being good at using a meterstick, or perhaps no tools besides their own arm length, they are gonna suddenly not be able to measure, unless they go through the effort of learning to measure without the tape.<p>You can argue that measuring tape is a crutch preventing people from learning how to properly measure, and has its own limitations, but regardless its still really helpful, especially for people who only need to measure things occassionally, and not super long things.<p>ChatGPT is a tool. Just like all other tools, like computers, cars, etc., if you take it away, most people cannot perform the function for which they relied on the tool to help them do.
why is this surprising. all such tools hamper learning. if you want to learn, read books, read and write. don't use a spellchecker for ur language exam. no calculator for calculus. pen and paper. how is this going backwards :(
The title could be worded better. Kids using "base" GPT4 performed poorly but the ones with access to a finely-tuned "tutor" GPT4 did okay. The study was purposefully done in a domain the current SoTA LLMs struggle in (Math).<p>From the (draft!) paper's abstract:<p><pre><code> A key remaining question is how generative AI affects learning, namely, how humans acquire new skills as they perform tasks. This kind of skill learning is critical to long-term productivity gains, especially in domains where generative AI is fallible and human experts must check its outputs.
..
Consistent with prior work, our results show that access to GPT-4 significantly improves performance (48% improvement for GPT Base and 127% for GPT Tutor).
However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access (17% reduction for GPT Base). That is, access to GPT-4 can harm educational outcomes.
These negative learning effects are largely mitigated by the safeguards included in GPT Tutor.
</code></pre>
<a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486" rel="nofollow">https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486</a>
Not surprising. Test is all about memorising things. If you don't need to memorise everything because it's on google, you won't.<p>Thus, when the test rolls around, nothing is memorised and then they do bad.<p>It's like memorising phone numbers VS keeping in the contacts app. Before I memorised tons of numbers, but now they're all on the app and I barely recall my own.
I was willing to entertain the idea they could do better. I guess the tests have to be written to leverage the skill.<p>That said, all things being equal kids who write notes by hand out-perform kids who type them. Even touch type them. So maybe the old ways are better in this specific brain-knowledge-competency-understanding forming space?
It seems kind of obvious, no?<p>The act of repetition and processing the data ourselves is what leads to a deeper understanding, and asking a chatbot for an answer seems like it would skip the thinking required when learning "the old fashioned way."<p>Maybe we can learn how to incorporate using chatbots in education, but I suspect there need to be guardrails on when and how they are used so students can get the benefit of doing the work themselves.
> A third group of students had access to a revised version of ChatGPT that functioned more like a tutor. This chatbot was programmed to provide hints without directly divulging the answer. The students who used it did spectacularly better on the practice problems, solving 127 percent more of them correctly compared with students who did their practice work without any high-tech aids.<p>Is it me, or is does this directly contradicts the title?
What if the test is irrelevant to the current times?<p>“Those with ChatGPT solved 48 percent more of the practice problems correctly, but they ultimately scored 17 percent worse on a test of the topic that the students were learning.”<p>So, in the real world, where people can use chatgpt in their jobs, the kids that use it will do better than the kids who don’t.<p>Maybe a better test is: can you catch chatgpt when it is wrong? Not, can you answer without ChatGPT?
I recently used AI assistants for help with programming homework. My usual prompts include "help me think in the right direction", "is my thinking correct" etc. I also find myself copy pasting a question in chat to understand it better.<p>I had the suspicion that this is not aiding in my learning process even though I am able to "solve" more problems. Nice to see this confirmed. Time to stop!
Side note and blog promotion: I find fascinated that ChatGPT can easily simulate the age of child when giving answers for homework: <a href="https://www.fabianzeindl.com/posts/chatgpt-simulating-agegroups" rel="nofollow">https://www.fabianzeindl.com/posts/chatgpt-simulating-agegro...</a>
What were the primary reasons that made students who used ChatGPT do poorly on math assessments, even though they had worked correctly through a greater number of practice problems?
> A draft paper about the experiment was posted on the website of SSRN, formerly known as the Social Science Research Network, in July 2024. The paper has not yet been published in a peer-reviewed journal and could still be revised.<p>Should have started with that.<p>A study without independent replication hardly counts as «researchers found», much less one that hadn't even been peer-reviewed yet !
I think the problem that people don't see anymore is using tests themselves. A clever idea is worth more than a single tick in the correct checkbox. This applies to maths as well. Tests are faster to check and, supposedly, objective, but a viva voce exam is still superior imho.
The evaluation method is wrong.<p>It's like when cars first came out, you ask people to drive cars for a month and they get used to cars. Then you ask them to compete in a horse race and see how fast they can go.<p>We should evaluate how fast they solve a problem, no matter how.
I use ChatGPT 4o to check my child's homework, but I forbid them from using it directly. That way, I can make sure the work is correct (or at least wrong in the same way as ChatGPT) without straining my tired brain.
I think the way to us ChatGPT is to have it explain a concept once and give a few examples.<p>After that, the student should struggle the old fashion way with problems.<p>I would like to see a study that looks at this approach.
Did nobody read the article? It says right there that the students who used chatgpt right, as a tutor, did much better than their peers.<p>If your human tutors just give you the answers when you ask for them, how do you think it'll ho?
I have a visceral dislike, even hate, for what the LLM hype brought the world. The never ending slop it is spouting, filling up the entire internet. More and more I get confronted with images and media that turn out to be AI generated, when I find out I am disgusted and just close the tab.<p>Soulless drivel, endlessly streaming.<p>And I'm confident that the education system as we know it will be severely damaged because of it.<p>Even in our own field, I can guarantee you that software developers that "grew up" with these garbage AI assistants will be worse coders than the generation that came before.
You will never develop the understanding, the insight, that's needed by chatgpt'ing your way through college and life.<p>Excellent news for my own market value of course, but I don't hesitate to say that I regret the LLM hype happened, the impact on the world is overwhelmingly negative (not even touching on the catastrophic environmental and financial cost to society).
Are people not reading the article here?<p>Let me tldr:<p><pre><code> - Study had 3 groups: normal GPT, system prompt to make GPT act as tutor and focus on giving hints, not answers, and no GPT
Group 1 (normal GPT)
- 48% better on practice problems
- 17% worse on test
Group 2 (tutor GPT)
- 127% better on practice problems
- equal test score to control group
GPT errors:
- 50% error rate
- 8% error on arithmetic problems
- step by step instructions we're wrong 42% of time
- GPT tutor was fed answers
- students with GPT and GPT tutor predicted that they did better (so both groups were over confident)
</code></pre>
Paper: <a href="https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486" rel="nofollow">https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4895486</a><p>I'll reply with my opinion to this comment. But many comments are not responding to the article content