AI tools are spotting errors in research papers

601 pointsby kgwgk2 months ago

51 comments

This actually feels like an amazing step in the right direction.If AI can help spot obvious errors in published papers, it can do it as part of the review process. And if it can do it as part of the review process, authors can run it on their own work before submitting. It could massively raise the quality level of a lot of papers.What's important here is that it's part of a process involving experts themselves -- the authors, the peer reviewers. They can easily dismiss false positives, but especially get warnings about statistical mistakes or other aspects of the paper that aren't their primary area of expertise but can contain gotchas.

评论 #43300980 未加载

评论 #43302312 未加载

评论 #43305652 未加载

评论 #43303931 未加载

评论 #43306769 未加载

评论 #43305682 未加载

评论 #43302963 未加载

评论 #43304302 未加载

评论 #43306758 未加载

评论 #43301184 未加载

评论 #43308697 未加载

评论 #43306248 未加载

评论 #43307770 未加载

评论 #43301936 未加载

评论 #43307208 未加载

评论 #43307244 未加载

评论 #43307242 未加载

评论 #43306757 未加载

YeGoblynQueenne2 months ago

Needs more work.>> Right now, the YesNoError website contains many false positives, says Nick Brown, a researcher in scientific integrity at Linnaeus University. Among 40 papers flagged as having issues, he found 14 false positives (for example, the model stating that a figure referred to in the text did not appear in the paper, when it did). “The vast majority of the problems they’re finding appear to be writing issues,” and a lot of the detections are wrong, he says.>> Brown is wary that the effort will create a flood for the scientific community to clear up, as well fuss about minor errors such as typos, many of which should be spotted during peer review (both projects largely look at papers in preprint repositories). Unless the technology drastically improves, “this is going to generate huge amounts of work for no obvious benefit”, says Brown. “It strikes me as extraordinarily naive.”

评论 #43304377 未加载

评论 #43305935 未加载

评论 #43304613 未加载

sfink2 months ago

Don't forget that this is driven by present-day AI. Which means people will assume that it's checking for fraud and incorrect logic, when actually it's checking for self-consistency and consistency with training data. So it should be great for typos, misleading phrasing, and cross-checking facts and diagrams, but I would expect it to do little for manufactured data, plausible but incorrect conclusions, and garden variety bullshit (claiming X because Y, when Y only implies X because you have a reasonable-sounding argument that it ought to).Not all of that is out of reach. Making the AI evaluate a paper in the context of a cluster of related papers might enable spotting some "too good to be true" things.Hey, here's an idea: use AI for mapping out the influence of papers that were later retracted (whether for fraud or error, it doesn't matter). Not just via citation, but have it try to identify the no longer supported conclusions from a retracted paper, and see where they show up in downstream papers. (Cheap "downstream" is when a paper or a paper in a family of papers by the same team ever cited the upstream paper, even in preprints. More expensive downstream is doing it without citations.)

评论 #43301963 未加载

评论 #43304219 未加载

评论 #43307220 未加载

评论 #43303227 未加载

RainyDayTmrw2 months ago

Perhaps our collective memories are too short? Did we forget what curl just went through with AI confabulated bug reports[1]?[1]: <a href="https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-for-intelligence/" rel="nofollow">https://daniel.haxx.se/blog/2024/01/02/the-i-in-llm-stands-f...</a>

simonw2 months ago

"YesNoError is planning to let holders of its cryptocurrency dictate which papers get scrutinized first."Sigh.

评论 #43304787 未加载

评论 #43301059 未加载

评论 #43301806 未加载

surferbayarea2 months ago

Here are 2 examples from the Black Spatula project where we were able to detect major errors: - <a href="https://github.com/The-Black-Spatula-Project/black-spatula-project/issues/3">https://github.com/The-Black-Spatula-Project/black-spatula-p...</a> - <a href="https://github.com/The-Black-Spatula-Project/black-spatula-project/issues/10">https://github.com/The-Black-Spatula-Project/black-spatula-p...</a>Some things to note : this didn't even require a complex multi-agent pipeline. A single shot prompting was able to detect these errors.

评论 #43310639 未加载

topaz02 months ago

This is such a bad idea. Skip the first section and read the "false positives" section.

评论 #43300967 未加载

评论 #43300930 未加载

评论 #43301544 未加载

评论 #43301142 未加载

评论 #43301587 未加载

评论 #43301034 未加载

评论 #43301505 未加载

Hilift2 months ago

If anyone is not aware of Retraction Watch, their implementations of "tortured phrases" was a revelation. And it has exposed some serious flaws. Like "vegetative electron microscopy". Some of the offending publications/authors have hundreds of papers.<a href="https://retractionwatch.com/2025/02/10/vegetative-electron-microscopy-fingerprint-paper-mill/" rel="nofollow">https://retractionwatch.com/2025/02/10/vegetative-electron-m...</a><a href="https://retractionwatch.com/2024/11/11/all-the-red-flags-scientific-reports-retracts-paper-sleuths-called-out-in-open-letter/" rel="nofollow">https://retractionwatch.com/2024/11/11/all-the-red-flags-sci...</a>

latexr2 months ago

<a href="https://archive.ph/20250307115346/https://www.nature.com/articles/d41586-025-00648-5" rel="nofollow">https://archive.ph/20250307115346/https://www.nature.com/art...</a>

gusgus012 months ago

I'm extremely skeptical for the value in this. I've already seen wasted hours responding to baseless claims that are lent credence by AI "reviews" of open source codebases. The claims would have happened before but these text generators know how to hallucinate in the correct verbiage to convince lay people and amateurs and are more annoying to deal with.

lifeisstillgood2 months ago

It’s a nice idea, and I would love to be able to use it for my own company reports (spotting my obvious errors before sending them to my bosses boss)But the first thing I noticed was the two approaches highlighted - one a small scale approach that does not publish first but approaches the authors privately - and the other publishes first, does not have human review and has its own cryptocurrencyI don’t think anything quite speaks more about the current state of the world and the choices in our political space

lfsh2 months ago

I am using Jetbrain's AI to do code analysis (find errors).While it sometimes spot something I missed it also gives a lot of confident 'advise' that is just wrong or not useful.Current AI tools are still sophisticated search engines. They cannot reason or think.So while I think it could spot some errors in research papers I am still very sceptical that it is useful as trusted source.

EigenLord2 months ago

The role of LLMs in research is an ongoing, well, research topic of interest of mine. I think it's fine so long as a 1. a pair of human eyes has validated any of the generated outputs and 2. The "ownership rule": the human researcher is prepared to defend and own anything the AI model does on their behalf, implying that they have digested and understood it as well as anything else they may have read or produced in the course of conducting their research. Rule #2 avoids this notion of crypto-plagiarism. If you prompted for a certain output, your thought in a manner of speaking was the cause of that output. If you agree with it, you should be able to use it. In this case, using AI to fact check is kind of ironic, considering their hallucination issues. However infallibility is the mark of omniscience; it's pretty unreasonable to expect these models to be flawless. They can still play a supplementary role to the review process, a second line of defense for peer-reviewers.

webdoodle2 months ago

The push for AI is about controlling the narrative. By giving AI the editorial review process, it can control the direction of science, media and policy. Effectively controlling the course of human evolution.On the other hand, I'm fully supportive of going through ALL of the rejected scientific papers to look for editorial bias, censorship, propaganda, etc.

评论 #43301973 未加载

评论 #43301375 未加载

robwwilliams2 months ago

Great start but definitely will require supervision by experts in the fields. I routinely use Claude 3.7 to flag errors in my submissions. Here is a prompt I used yesterday:“This is a paper we are planning to submit to Nature Neuroscience. Please generate a numbered list of significant errors with text tags I can use to find the errors and make corrections.”It gave me a list of 12 errors of which Claude labeled three as “inconsistencies”, “methods discrepancies”. and “contradictions”. When I requested that Claude reconsider it said “You are right, I apologize” in each of these three instances. Nonetheless it was still a big win for me and caught a lot of my dummheits.Claude 3.7 running in standard mode does not use its context window very effectively. I suppose I could have demanded that Claude “internally review (wait: think again)” for each serious error it initially thought it had encountered. I’ll try that next time. Exposure of chain of thought would help.

huijzer2 months ago

I think improving incentives is the real problem in science. Tools aren’t gonna fix it.

InkCanon2 months ago

This sounds way, way out of how LLMs work. They can't count the R's in strarwberrrrrry, but they can cross reference multiple tables of data? Is there something else going on here?

评论 #43302492 未加载

delusional2 months ago

Reality check: yesnoerror, the only part of the article that actually seems to involve any published AI reviewer comments, is just checking arxiv papers. Their website claims that they "uncover errors, inconsistencies, and flawed methods that human reviewers missed." but arxiv is of course famously NOT a peer-reviewed journal. At best they are finding "errors, inconsistencies, and flawed methods" in papers that human reviewers haven't looked at.Let's then try and see if we can uncover any "errors, inconsistencies, and flawed methods" on their website. The "status" is pure madeup garbage. There's no network traffic related to it that would actually allow it to show a real status. The "RECENT ERROR DETECTIONS" lists a single paper from today, but looking at the queue when you click "submit a paper" lists the last completed paper as the 21st of February. The front page tells us that it found some math issue in a paper titled "Waste tea as absorbent for removal of heavy metal present in contaminated water" but if we navigate to that paper[1] the math error suddenly disappears. Most of the comments are also worthless, talking about minor typographical issues or misspellings that do not matter, but of course they still categorize that as an "error".It's the same garbage as every time with crypto people.[1]: <a href="https://yesnoerror.com/doc/82cd4ea5-4e33-48e1-b517-5ea3e2c5f268" rel="nofollow">https://yesnoerror.com/doc/82cd4ea5-4e33-48e1-b517-5ea3e2c5f...</a>

sega_sai2 months ago

As a researcher I say it is a good thing. Provided it gives a small number of errors that are easy to check, it is a no-brainer. I would say it is more valuable for authors though to spot obvious issues. I don't think it will drastically change the research, but is an improvement over a spell check or running grammarly.

tomrod2 months ago

I know academics that use it to make sure their arguments are grounded, after a meaningful draft. This helps them in more clearly laying out their arguments, and IMO is no worse than the companies that used motivated graduate students for review the grammar and coherency of papers written by non-native language speakers.

epidemiology2 months ago

AI tools are hopefully going to eat lots of manual scientific research. This article looks at error spotting, but you follow the path of getting better and better at error spotting to it's conclusion and you essentially reproduce the work entirely from scratch. So in fact AI study generation is really where this is going.All my work could honestly be done instantaneously with better data harmonization & collection along with better engineering practices. Instead, it requires a lot of manual effort. I remember my professors talking about how they used to calculate linear regressions by hand back in the old days. Hopefully a lot of the data cleaning and study setup that is done now sounds similar to a set of future scientists who use AI tools to operate and check these basic programatic and statistical tasks.

评论 #43301083 未加载

jongjong2 months ago

I expect that for truly innovative research, it might flag the innovative parts of the paper as a mistake if they're not fully elaborated upon... E.g. if the author assumed that the reader possesses certain niche knowledge.With software design, I find many mistakes in AI where it says things that are incorrect because it parrots common blanket statements and ideologies without actually checking if the statement applies in this case by looking at it from first principles... Once you take the discussion down to first principles, it quickly acknowledges its mistake but you had to have this deep insight in order to take it there... Some person who is trying to learn from AI would not get this insight from AI; instead they would be taught a dumbed-down, cartoonish, wordcel version of reality.

TZubiri2 months ago

Didn't this YesNoError thing start as a memecoin?

yosito2 months ago

While I don't doubt that AI tools can spot some errors that would be tedious for humans to look for, they are also responsible for far more errors. That's why proper understanding and application of AI is important.

tibbar2 months ago

Recently I used one of the reasoning models to analyze 1,000 functions in a very well-known open source codebase. It flagged 44 problems, which I manually triaged. Of the 44 problems, about half seemed potentially reasonable. I investigated several of these seriously and found one that seemed to have merit and a simple fix. This was, in turn, accepted as a bugfix and committed to all supported releases of $TOOL.All in all, I probably put in 10 hours of work, I found a bug that was about 10 years old, and the open-source community had to deal with only the final, useful report.

Rzor2 months ago

This could easily turn into a witch hunt [0], especially given how problematic certain fields have been, but I can't shake the feeling that it is still an interesting application and like the top comment said a step in the right direction.[0] - Imagine a public ranking system for institutions or specific individuals who have been flagged by a system like this, no verification or human in the loop, just a "shit list"

BurningFrog2 months ago

In the not so far future we should have AIs that have read all the papers and other info in a field. They can then review any new paper as well as answering any questions in the field.This then becomes the first sanity check for any paper author.This should save a lot of time and effort, improve the quality of papers, and root out at least some fraud.Don't worry, many problems will remain :)

noiv2 months ago

I'm no member of the scientific community but I fear this project or another will go beyond math errors and eventually establish some kind of incontrovertible AI entity giving a go/nogo on papers. Ending all science in the process because publishers will love it.

SwtCyber2 months ago

This is both exciting and a little terrifying. The idea of AI acting as a "first-pass filter" for spotting errors in research papers seems like an obvious win. But the risk of false positives and potential reputational damage is real...

luxuryballs2 months ago

This is going to be amazing for validation and debugging one day, imagine having the fix PR get opened by the system for you with code to review including unit test to reproduce/fix the bug that caused the prod exception @.@

anonu2 months ago

Wait, I threw out my black plastic spatula for nothing? So there was all this media noise about it - but not a single article came across my (many) newsfeeds about a mistake or a retraction.

Nickolya2 months ago

AI tools revolutionizing research by spotting errors is a game-changer for academia. With the ability to detect inconsistencies, plagiarism, and even fabricated data, AI is enhancing the credibility of scientific studies. This is especially crucial in a time when the volume of published research is growing exponentially, making manual fact-checking nearly impossible.However, despite the benefits, AI-driven tools are facing increasing restrictions. Many schools and universities, as well as some governments, have started blocking access to AI services, fearing academic dishonesty or misuse. If you still need access to these tools for legitimate research purposes, proxy services like NodeMaven can help bypass these restrictions, ensuring you stay connected to the latest advancements. I can drop you a link, it helped me a lot while writing my thesis.: <a href="https://nodemaven.com/" rel="nofollow">https://nodemaven.com/</a>

timoth3y2 months ago

Perhaps this is a naive question from a non-academic, but why isn't deliberately falsifying data or using AI tools or photoshop to create images career-ending?Wouldn't a more direct system be one in which journals refused submissions if one of the authors had committed deliberate fraud in a previous paper?

systemstops2 months ago

When people starting building tools like this to analyze media coverage of historic events, it will be a game changer.

FilosofumRex2 months ago

The low hanging fruit is to target papers cited in corporate media; NYT, WSJ, WPO, BBC, FT, The Economist, etc. Those papers are planted by politically motivated interlocutors and timed to affect political events like elections or appointments.Especially those papers cited or promoted by well-known propagandists like Freedman of NYT, Eric Schmidt of Google or anyone on the take of George Soros' grants.

bookofjoe2 months ago

<a href="https://archive.ph/fqAig" rel="nofollow">https://archive.ph/fqAig</a>

ysofunny2 months ago

top two links at this moment are:> AI tools are spotting errors in research papers: inside a growing movement (nature.com)and> Kill your Feeds – Stop letting algorithms dictate what you think (usher.dev)so we shouldn't let the feed algorithms influence our thoughs, but also, AI tools need to tell us when we're wrong

randomNumber72 months ago

Maybe one day AI can tell us the difference between correlation and a causal relationship.

_tom_2 months ago

This basically turns research papers as a whole into a big generative adversarial network.

gscott2 months ago

Spelling errors could be used as a gauge that your work was not produced by AI

rosstex2 months ago

Why not just skip the human and have AI write, evaluate and submit the papers?

评论 #43304244 未加载

nhlx22 months ago

Has AI been used to find errors in Knuth’s books? Or Chomsky’s?

dbg314152 months ago

Can we get it to fact check politicians and Facebook now?

TheRealPomax2 months ago

Oh look, an actual use case for AI. Very nice.

imagetic2 months ago

Breaking: Research papers spot errors in AI tools.

bigbuppo2 months ago

I guess the AI doesn't like the competition?

seventytwo2 months ago

This is a fantastic use of AI.

mac-mc2 months ago

Now they need to do it for their own outputs to spot their own hallucination errors.

thdhhghgbhy2 months ago

Now all AI tools have to do is spot errors made by themselves.

sunami-ai2 months ago

I built this AI tool to spot "bugs" in legal agreements, which is harder than spotting errors in research papers because the law is open textured and self contradicting in many places. But no one seems to care about it on HN, Gladly, our early trial customers are really blown away by it.Video demo with human wife narrating it: <a href="https://www.youtube.com/watch?v=346pDfOYx0I" rel="nofollow">https://www.youtube.com/watch?v=346pDfOYx0I</a>Cloudflare-fronted Live site (hopefully that means it can withstand traffi): <a href="https://labs.sunami.ai/feed" rel="nofollow">https://labs.sunami.ai/feed</a>Free Account Prezi Pitch: <a href="https://prezi.com/view/g2CZCqnn56NAKKbyO3P5/" rel="nofollow">https://prezi.com/view/g2CZCqnn56NAKKbyO3P5/</a>

评论 #43305784 未加载

more_corn2 months ago

This is great to hear. I good use of AI if the false positives can be controlled.

评论 #43300863 未加载