Spotting LLMs with Binoculars: Zero-Shot Detection of Machine-Generated Text

161 pointsby victormustarover 1 year ago

17 comments

adaboeseover 1 year ago

I can talk a lot about this, since this is the space I've spent a lot in experimenting. All I will say is that all these detectors (a) create a ton of false-positives, and (b) are incredibly easy to bypass if you know what you are doing.As an example, one method that I found that works extremely well is to simply rewrite the article section by section with instructions that require to mimic the writing style of an arbitrary block of human written text.This works a lot better than (as an example) asking to write in a specific style. Like, if I just say something along the lines of "write in a casual style that conveys lightheartedness towards the topic" is not going to work as good as simply saying "rewrite mimicking the style in which the following text block is written X" (where X is an example of a block of human written text).There are some silly things that will (a) trigger human written text to be detected as a AI and (b) that allow to avoid AI detection, e.g. using broad dictionary tends to trigger AI bots to detect the text as written by AI. So if you are using Grammarly to "improve your writing", then don't be surprised if it gets flagged. The inverse is true too. If you some statistical analyzes to replace less common expressions with more common expressions, AI-text is less likely to be detected as AI.If someone is interested, I can talk a lot more about hundreds of experiments I've done by now.

评论 #39113495 未加载

评论 #39111936 未加载

评论 #39111921 未加载

评论 #39115933 未加载

评论 #39113982 未加载

评论 #39111898 未加载

评论 #39115474 未加载

评论 #39114998 未加载

ISLover 1 year ago

Is a layman's interpretation of this to state: LLMs tend to perform like aggregated humanity, but any given human will differ. Since all the volume of a high-dimensional sphere is at the edge, almost nobody is like the mean, so the false-positive-rate is low?It's a clever plan, until the LLMs do some adversarial training....

评论 #39110745 未加载

评论 #39110039 未加载

评论 #39110260 未加载

评论 #39111313 未加载

评论 #39119113 未加载

评论 #39113872 未加载

dr_dshivover 1 year ago

1. How well can it detect if the writer edits it?2. How well can it detect if the prompter tries to hide it?3. How well can it detect if people tend to start writing like chatGPT?I grade a lot of papers and encourage/teach chatGPT use. It is so easy for me to detect poor usage. Quality is still easy to distinguish. Skillful use of these tools is a meaningful skill. In fact, it usually requires the same underlying skill! (Close reading, purposefulness, authenticity, etc)I love chatGPT because it obviates stuffy academic writing. Who needs it. Be clear and direct, that’s valuable!

评论 #39114016 未加载

评论 #39113557 未加载

评论 #39111916 未加载

评论 #39115979 未加载

Imnimoover 1 year ago

According to their demo, their Limitations section of their github repo is AI-generated.>All AI-generated text detectors aim for accuracy, but none are perfect and can have multiple failure modes (e.g., Binoculars is more proficient in detecting English language text compared to other languages). This implementation is for academic purposes only and should not be considered as a consumer product. We also strongly caution against using Binoculars (or any detector) without human supervision.

vicgalle_over 1 year ago

> <a href="https://twitter.com/minimaxir/status/1749893683137454194" rel="nofollow">https://twitter.com/minimaxir/status/1749893683137454194</a>Not very promising, though.

评论 #39110827 未加载

评论 #39112413 未加载

评论 #39113503 未加载

评论 #39110979 未加载

评论 #39110394 未加载

binsquareover 1 year ago

I'm not convinced that we're on the right path in detecting ai generated content.We've been looking at the end result and making conclusions about the journey - and that will always comes with degrees of uncertainty. A false positive rate of 0.01% now probably will not be applicable as people adapt and grow alongside ai content.I wonder if anyone's working on software that documents the journey of the output similar to like git commits, such that we can analyze both the metadata (journey) & output (end result) to determine human authenticity.

评论 #39110534 未加载

评论 #39111606 未加载

vunderbaover 1 year ago

I have thousands of short hand style notations that I've written over the years. Recently, I've been having GPT rewrite them. I first provide GPT with approximately 8 kB worth of my longform writing, and then ask it to rewrite the short hand using similar diction and style.Concerned about this issue I would also run the corresponding outputs through any LLM detection programs I could find (ZeroGPT, etc). None of the outputs have ever been detected as being machine generated.

TuringNYCover 1 year ago

The false positive rate would kill most use cases here. Even 1/10000 false accusations of academic integrity would be too much.

评论 #39120690 未加载

评论 #39110976 未加载

评论 #39110804 未加载

ff317over 1 year ago

Maybe rather than focusing so much on how to detect AI-generated content, we should instead focus on our general ability to validate the truthiness of content regardless of source. I don't really care if an AI wrote it, so long as the content is meaningful and informative. I do care if it's a load of junk, even if a human did write it.

评论 #39111914 未加载

评论 #39112099 未加载

etwiggover 1 year ago

If the text is good, and someday it will be, I don't care if an LLM wrote it. If it's bad, I don't care if a person wrote it.The only reason to care is that the implicit proof-of-work signal has broken because LLM text is so cheap. Open forums might need to be pay-per-submission someday...

评论 #39111173 未加载

评论 #39113573 未加载

评论 #39112640 未加载

评论 #39111961 未加载

评论 #39111562 未加载

aaroninsfover 1 year ago

So much time and effort being wasted on a problem we are entirely unequipped to ever "solve."I guess the full-employment economy demands much of us.

Jeddover 1 year ago

<a href="https://huggingface.co/spaces/tomg-group-umd/Binoculars" rel="nofollow">https://huggingface.co/spaces/tomg-group-umd/Binoculars</a>

jjackson5324over 1 year ago

> false positive rate of 0.01%What would be an acceptable false positive rate for something like this to be used at schools and universities?Like, obviously 0.01% is not acceptable, but what would be?

评论 #39110182 未加载

评论 #39110338 未加载

stainablesteelover 1 year ago

im amazed at how difficult this has proven to beevery time i ask a question to an llm it spits out a generic response format:'''well, subject x has a lot of nuance filled with even more nuance. and it may be that x is true but y could also be true, here's a list of related sentences:1. subject 1 is pretty broad in scope but applies to the question2. subject 2 is more niche conceptually and applies to the core of the topic without addressing every aspect of it3. and the list goes on'''this is the technology you can't surpass?

评论 #39113719 未加载

akasakahakadaover 1 year ago

>>> despite not being trained on any ChatGPT dataI doubt that since ChatGPT trained all other LLM.

kristopolousover 1 year ago

This was an open kaggle prize for a while

godelskiover 1 year ago

Research papers definitely need to be more nuanced with the "zero-shot" language. Originally this term was used to describe out of distribution and out of class instances and in the context of metalearning (if you don't know, see under the history section I left for context). This term has been really bastardized and it makes it difficult to differentiate works now. "Out-of-domain" is a fuzzy concept and I think there are some weird usages where people would call something OOD but wouldn't call a test set OOD. OOD classically doesn't mean something not in training data, but not in the distribution of data your data is a proxy for. Certainly the data here is within distribution as it is using LLMs.> Our approach, Binoculars, is so named as we look at inputs through the lenses of two different language models.How is LLM generated data out of domain of LLMs? Specifically their github demonstrates with Falcon-7B and Falcon-7B-Instruct models. Instruct models are specifically tuned on their own outputs. We can even say the non-instruct models are also "trained on" LLM outputs as you're using the outputs in the calculation of the cost functions, meaning they see that data and are using that information, which is why> Unsurprisingly, LLMs tend to generate text that is unsurprising to an LLM.Because they are trained on cross-entropy which directly related to perplexity. Are detector researchers really trying to use perplexity to detect LM generation? That seems odd since that's dependent on the exact thing LMs are minimizing... It also seems weird because the premise from the paper is that human writing has more "surprise" than that from an LM, but we're instructing LMs to sound more human. Going about detection this way does not sound like it would be a sustainable method (not that LLM detectors are reliable and I think we all know they frequently flag generic or standard text, which of course they do if you're highly dependent on entropy).=== History ===First example I'm aware of is the "one-shot" case from[0] (2000) and abstract says> We suggest that this density over transforms may be shared by many classes, and demonstrate how using this density as “prior knowledge” can be used to develop a classifier based on only a single training example for each class.Which we can think of as taking a model and fine tuning (often now just called training) with a single epoch, relying on the prior knowledge that the model learned that is general to other tasks (such as training on cifar-10 should be a good starting point for classifying lions).Then come [1,2] in 2008. Where [1]'s title is "Importance of Semantic Representation: Dataless Classification" and [2] (from Yoshua Bengio's group) is "Zero-data Learning of New Tasks".[1] trains on Wikipedia and then tests semantic classification on a modified 20 Newsgroup dataset (expanded labels) and Yahoo Answers dataset and is about the generalizability of the embedding mechanism cross domain. Specifically they compared Bag of Words (BoW) to Explicit Semantic Analysis (ESA).I'll just quote for [2]> We tested the ability of the models to perform zero-data generalization by testing the discrimination ability between two character classes not found in the training set.Part of their experiments includes training on numeric character recognition and testing on alphabetical characters. They also do some low-shot experiments.[0] <a href="https://people.cs.umass.edu/~elm/papers/cvpr2000.pdf" rel="nofollow">https://people.cs.umass.edu/~elm/papers/cvpr2000.pdf</a>[1] <a href="https://citeseerx.ist.psu.edu/document?doi=ee0a332b4fc1e82a9999acd6cebceb165dc8645b" rel="nofollow">https://citeseerx.ist.psu.edu/document?doi=ee0a332b4fc1e82a9...</a>[2] <a href="https://cdn.aaai.org/AAAI/2008/AAAI08-103.pdf" rel="nofollow">https://cdn.aaai.org/AAAI/2008/AAAI08-103.pdf</a>