New Test for Computers: Grading Essays at College Level

39 pointsby jfcabout 12 years ago

13 comments

lmkgabout 12 years ago

There are a few reasons I'm not excited about this sort of thing, but there's one big reason where I think it would be a distinct improvement over the status quo: standardized testing.Most standardized testing in the US, especially the ones deployed at a national scale, are designed for grading first, and testing second. This is a simple concession to feasibility: if you're trying to evaluate a few dozen million students, scantron is the only current tool that is time-efficient, cost-efficient, and consistent ("fair") at scale. The constraints imposed by that tool are significant, and tying so much incentives to those tests have warped the whole education system.Automated essay grading, no matter how imperfect, would still be an expansion of the available testing techniques that could be deployed at the national scale. It would expand the range of skills that are measurable, and therefore incentivized to teach.The cynical way of putting that is, no matter how shitty computers are at grading essays, they're still an improvement when the competition is multiple-choice questions. I have decided to be excited by that.

评论 #5494534 未加载

评论 #5495068 未加载

评论 #5495399 未加载

a_pabout 12 years ago

A quote from the article:“My first and greatest objection to the research is that they did not have any valid statistical test comparing the software directly to human graders,” said Mr. Perelman.That is a good enough reason not to use the software. If it does eventually pass such a test, I doubt that it would ever be better than the best human grader but rather it would be better than the average grader.

评论 #5494639 未加载

SatvikBeriabout 12 years ago

Of course there are a million-and-one possible problems, but the upside is also massive, especially if an automated grader is used to supplement human graders. Some examples:-A student can iterate and improve their essays on their own using an auto-grader. They can get it up to a decent level just through iteration before having to get a human involved.-Human graders are highly subjective-Human graders tend to be strongly affected by factors such as "number of hours since they last ate"-Different humans have different levels of harshness, a machine could help calibrate these-Outside, say, the top 10% of colleges, the vast majority of human graders suck. Especially for standardized tests, they typically get paid near minimum wage and the qualifications are along the lines of "have a degree." While automated graders will probably never be as good as the best graders, they don't have to be.So overall, I'm pretty excited. Even a half-baked solution has a lot of potential value.

评论 #5495383 未加载

TheCapnabout 12 years ago

Feedback is more than a grade letter. Real teachers provide insights and direction to material like an essay to stimulate critical thought and development. How well can a computer grade system instill this type of learning on students? Can it at all?

geoffpadoabout 12 years ago

I actually had to use something like this for a class in college… maybe 2 years ago? It was the worst thing. It would auto-grade, tell you your score, and how you could improve it. However, there was almost no way to actually respond to the grade. It would often tell you you didn't cover a specific topic, even when you had. There was no way of pointing out where you had discussed it, no way in the program of asking for a human review, anything.All this, of course, was exacerbated by the class being run by possibly the laziest professor I had all throughout college (all essays were graded by this, all tests in class were taken by clicker, all notes were Powerpoint slides provided by the book's publisher). Possibly, in the hands of someone who actually cared about teaching, a tool like this wouldn't have been so bad—but it seems to me that someone who cared about teaching would just grade manually to begin with.

marianne_navadaabout 12 years ago

As an educator, I appreciate technology that provides students a quicker way to get feedback, but this software only reinforces students' frustration with essays: no one reads them except the professor. Most are written to be read by one person. Artists are able to share their work, but for most majors that require writing, assignments are meant to be forgotten. This is exactly the reason why students who major in these fields graduate without a portfolio of their work.My husband and I developed <a href="https://chalktips.com/" rel="nofollow">https://chalktips.com/</a> to solve this problem. We wanted to make essays and school work for college students engaging and shareable. Students publish booklets and slideshows as part of their assignment. Students can tweet or share their work in Facebook or Tumblr.Last semester as part of the final, I had professionals from different fields comment on student work, and the feedback from students was amazing.Using AI to comment on essays might be efficient and probably comparable to having ONE person grade an assignment. But it doesn't make use of the power of community. It also does not address the fundamental problem with essays: after it's graded, so what?We currently have 1,400 users with more than 2,500 booklets and slideshows published. Students are embracing the platform. Of course, I'm a community college teacher without the clout of MIT professors. But we're hopeful that more students start demanding that they are given assignments that have utility after the course is over.

rollo_tommasiabout 12 years ago

The problematic aspects of this software are applicable to MOOCs and online education in general. Truly valuable, high-level education requires a level of precise, individually calibrated feedback in order to train students how to think and express themselves critically and rigorously (there are also obviously signalling, credentialing networking benefits that will never be replicable by MOOCs, but that's not relevant to a discussion of pure educational quality).Computer-driven mass education will never be able to provide that level of instruction. At best, it's a tool for bringing the workforce up to the basic level of competence necessary to function in an information economy, much like how the original purpose of the public education system was to crank out semi-skilled factory workers and low-level clerks. This may be a valuable end - it may help to stem the commoditization of the BA and the attendant tsunami of student debt - but there needs to be wider acknowledgement that it is actually fulfilling a purpose that is different from that of traditional higher education.

scarmigabout 12 years ago

One big advantage of this, if it ever becomes decent: iteration.Make it available to students. They can write an essay, submit, and get feedback on it. Repeat.It'd be difficult to get a machine to bring someone to great writing. But clear, concise prose that gets to the point? That seems reachable. And that would be an improvement over the status quo.In other words, machines won't anytime soon be able to know that a Wodehouse is superior to an Orwell. But an Orwell to, say, a Thomas Friedman or typical ninth grader? Totally.

评论 #5495435 未加载

评论 #5494844 未加载

ISLabout 12 years ago

One problem: Essay quality, and that of writing in general, is subjective.It's possible to assess objective properties of writing, but it cuts off the top end of the spectrum.<pre><code> Three quarks for Muster Mark! Sure he has not got much of a bark And sure any he has it's all beside the mark. </code></pre> Grade? F. Author: Joyce. Importance: Inspired Gell-Mann's naming of the quark [1].[1] <a href="http://en.wikipedia.org/wiki/Quark" rel="nofollow">http://en.wikipedia.org/wiki/Quark</a>

评论 #5497031 未加载

noonespecialabout 12 years ago

They're facing the same problem as testing (diagnosis) in large scale health care: How to test (and teach really) without having to take on the bothersome, unscalable task of actually getting to know the student.Wouldn't it be great if we could just figure out "education" once, code it up and then let it run on all of the students? Its easy really, all we need are identical students.

dansoabout 12 years ago

So, for this to be even a feasible solution, EdX needs to show more proof of concept...like, here's a sample question, here's the 100 sample answers that were used to machine-learn against, and here's how the auto-grader graded these 10 different answers (both good and bad).Why should anyone have faith that EdX has cracked the perfect mix of machine learning and NLP and other associated technologies needed to provide accurate assessments of essays? Even Google from time to time has trouble guessing intent. Wolfram Alpha even more so. If the engineers at these companies can't get it always right -- and it's not just engineering talent, but data and data analysis -- why should a school entrust one of its most important functions to EdX?Grading is something critical to get right, not just, "almost there." Think of how much time you spent arguing with a profesor that your score deserved an 8 instead of a 6, enough points to bring you from a B to a B+ for the semester...Think of the incentive to do so (career prospects). If the machine is ever wildly off just even in one case, would you ever take its assessments as gospel? Multiply yourself by 20 or 50 or whatever a usual professor's lecture load is, and now a ton of lag has been introduced into the grading workflow.Obviously, there are ways to mitigate this. One would be to write questions that are so narrowly focused that there are very clearly right and wrong answers...Which of course raises the problem of: why not just give everyone multiple choice tests, then?The sad thing is is that even if these machine graders were empirically better than human graders, they can't just be better, they have to be almost perfect. If a plane's autopilot failed, on the whole, 1 out of a 1000 times compared to 5 out of a 1000 times for human pilots...how much more pissed do you think victim families are going to be when they find out their loved ones died because of a algorithmic malfunction rather than just pilot error? People, for obvious reasons, don't like thinking of their work or their destinies being defined by deterministic machines...so if those machines aren't 99.99% right, then the fallout and pushback may cost schools more than the savings in professor-grading time.

评论 #5495051 未加载

评论 #5494872 未加载

评论 #5494757 未加载

JEVLONabout 12 years ago

Well I referenced a Harvard business school public journal and it got flagged as being from Wikipedia. So I do not believe we should let an automated system mark our essays just yet.

therobot24about 12 years ago

as a precursor to grading/giving feedback on an essay, i usually pull out a Naive Bayes network for topic classification and see if their report is correctly classified. It's kind of fun and a good indicator for what i'm about to read.