As far as I can tell, what they're proposing is:<p>Today, for each output token the LLM produces a probability for each possible output, then a 'sampler' makes a probability-weighted random choice. If the next-token probabilities are 90% for foo, 9% for bar and 1% for baz then the sampler chooses a random number between 0 and 1, if it's <0.9 it outputs foo, 0.9-0.99 it outputs bar, 0.99-1 it outputs baz.<p>But what if instead of using random numbers, you had a source of evenly distributed random numbers that was deterministic, based on some secret key?<p>Each candidate token would remain just as likely as it was before - there would still be a 90% chance of foo being chosen. So the output shouldn't degrade in quality.<p>And sure, some tokens will have 99.999% probability and their selection doesn't tell you much. But in most real-world use multiple wordings are possible and so on. So across a large enough sample of the output, you could detect whether the sampler was following your secret deterministic pattern.<p>Of course the downside is you've got to check on exactly the same LLM, and only people with the secret key can perform the check. And it's only applicable to closed-source LLMs.<p>I'm also not quite sure if it works when you don't know the exact prompt - so maybe my understanding of the paper is all wrong?
Worth pointing out that while watermarking is mathematically reliable, the scammers who are selling "AI detection" don't have the weight-level access that it requires.
What they didn't put in the limitations or other sections (unless I missed it) is that it can only apply to larger creative text, not to structured or repeated output. For example if you want to watermark generated code, you can't produce it as a diff to the existing file - the sampling changes will cause unwanted modifications.<p>Similar for things like "fix grammar in this long text" will have to tweak random words without a reason, because the existing text can't be 100% reproduced while injecting synth-id.
I have a question for all the LLM and LLM-detection researchers out there. Wikipedia says that the Turing test "is a test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human."<p>Three things seem to be in conflict here:<p>1. This definition of intelligence...i.e. "behavior indistinguishable from a human"<p>2. The idea that LLMs are artificial intelligence<p>3. The idea that we can detect if something is generated by an LLM<p>This feels to me like one of those trilemmas, where only two of the three can be true. Or, if we take #1 as an axiom, then it seems like the extent to which we can detect when things are generated by an LLM would imply that the LLM is not a "true" artificial intelligence. Can anyone deeply familiar with the space comment on my reasoning here? I'm particularly interested in thoughts from people actually working on LLM detection. Do you think that LLM-detection is technically feasible? If so, do you think that implies that they're not "true" AI (for whatever definition of "true" you think makes sense)?
After skimming through the paper I can’t immediately pick out the data that says how much more certainty there is for a given text to detect a watermark, and the graph of that certainty as the text size grows. (They seem to assert that the certainty grows as token count goes up, but it’s not clear by how much.)<p>I worry (and have already read worrying things) about “cheating detection” tools that have been deployed in schools. My intuition would be that there’s just too much entropy between something like an essay prompt and the essay itself. I guess it also depends on how specific the teacher’s essay prompt is as well.
Its easy to think of non-secure watermark methods to mark LLM generated text for lazy students or lazy copy writers. Occassional incorrect capitalization, etc
Add prompt to ChatGPT<p>Get answer.<p>Rewrite in your own words.<p>Feed back to chatGpT to check for errors.<p>Done. Watermarking really doesn’t solve any problem a clever person can’t trivially circumvent.
Several commenters who have not read the abstract of the paper are mentioning LLM-detection tools. That is not what is being shown here.<p>Rather they are saying how to modify the design of an LLM to deliberately inject watermarks into generated text such that it will be possible to detect that the text came from a particular LLM.<p>While interesting in the abstract, I think I can definitively say that absolutely nobody wants this. People trying to pass off LLM content (whether students or content providers) as human-written are not interested in being detected. People who are using LLMs to get information for their own knowledge or amusement or as a cybernetic augmentation do not need this. LLM providers want to drive adoption, and if you can be exposed as passing off LLM slop as your own, then nobody will use their stuff.