Hallucineted CVE against Curl: someone asked Bard to find a vulnerability

200 pointsby martijnartsover 1 year ago

12 comments

_pdp_over 1 year ago

While somewhat off-topic, I had an interesting experience highlighting the utility of GitHub's Copilot the other day. I decided to run Copilot on a piece of code functioning correctly to see if it would identify any non-existent issues. Surprisingly, it managed to pinpoint an actual bug. Following this discovery, I asked Copilot to generate a unit test to better understand the identified issue. Upon running the test, the program crashed just as Copilot had predicted. I then refactored the problematic lines as per Copilot's suggestions. This was my first time witnessing the effectiveness of Copilot in such a scenario, which provided small yet significant proof to me that Language Models can be invaluable tools for coding, capable of identifying and helping to resolve real bugs. Although they may have limitations, I believe any imperfections are merely temporary hurdles toward more robust coding assistants.

评论 #37905890 未加载

评论 #37907320 未加载

评论 #37906152 未加载

评论 #37906086 未加载

评论 #37908173 未加载

评论 #37909003 未加载

评论 #37909005 未加载

evrimoztamurover 1 year ago

If not malicious, then this shows that there are people out there who don't quite know how much to rely on LLMs or understand the limits of their capabilities. It's distressing.

评论 #37905274 未加载

评论 #37904730 未加载

评论 #37904925 未加载

评论 #37905839 未加载

评论 #37905158 未加载

abnercoimbreover 1 year ago

> you did not find anything worthy of reporting. You were fooled by an AI into believing that.The author's right. Reading the report I was stunned; the person disclosing the so-called vulnerability said:> To replicate the issue, I have searched in the Bard about this vulnerability.Does Bard clearly warn to never rely on it for facts? I know OpenAI says "ChatGPT may give you inaccurate information" at the start of each session.

评论 #37904689 未加载

评论 #37904700 未加载

ImAnAmateurover 1 year ago

There's a second wind to this story in the Mastodon replies. It sounds like the LLM appeared to be basing this output on a CVE that hadn't yet been made public, implying that it had access to text that wasn't public. I can't quite tell if that's an accurate interpretation of what I'm reading.>> @bagder it’s all the weirder because they aren’t even trying to report a new vulnerability. Their complaint seems to be that detailed information about a “vulnerability” is public. But that’s how public disclosure works? And open source? Like are they going to start submitting blog posts of vulnerability analysis and ask curl maintainers to somehow get the posts taken down???>> @derekheld they reported this before that vulnerability was made public though>> @bagder oh as in saying the embargo was broken but with LLM hallucinations as the evidence?>> @derekheld something like that yes

评论 #37905311 未加载

评论 #37905420 未加载

orfover 1 year ago

> I responsibly disclosed the information as soon as I found it. I believe there is a better way to communicate to the researchers, and I hope that the curl staff can implement it for future submissions to maintain a better relationship with the researcher community. Thank you!… yeah…

评论 #37904940 未加载

评论 #37905841 未加载

openasocketover 1 year ago

I was curious how many bogus security reports big open source projects have. If you go to <a href="https://hackerone.com/curl/hacktivity" rel="nofollow noreferrer">https://hackerone.com/curl/hacktivity</a> and scroll down to ones marked as "Not-applicable" you can find some additional examples. No other LLM hallucinations, but some pretty poorly-thought out "bugs".

chrsigover 1 year ago

Perhaps not useful to the conversation, but I really wish that whomever coined the behavior as a 'hallucination' had consulted a dictionary first.It's delusional, not hallucinated.Delusions are the irrational holdings of false belief, especially after contrary evidence has been provided.Hallucinations are false sensations or perceptions of things that do not exist.May some influential ML person read this and start to correct the vocabulary in the field :)

评论 #37904838 未加载

评论 #37908148 未加载

评论 #37906374 未加载

评论 #37905216 未加载

评论 #37904856 未加载

评论 #37906582 未加载

评论 #37905978 未加载

评论 #37907620 未加载

评论 #37904832 未加载

评论 #37905315 未加载

评论 #37905859 未加载

seniorsassycatover 1 year ago

So the reporter thinks that they were able to get accurate info about private details of a embargo'ed cve from Bard. If correct they would have found a cve in bard, not in curl.In this case the curl maintainers can tell the details are made up and don't correspond to any cve.

评论 #37906654 未加载

bawolffover 1 year ago

I'm not sure why this is interesting. AI was asked to make a fake vulnerability and it did. That's the sort of thing these AIs are good at, not exactly new at this point.

评论 #37907214 未加载

评论 #37907928 未加载

19hover 1 year ago

I’m doing reverse engineering work every now and then and a year ago I’d have called myself a fool but I have found multiple exploitable vulnerabilities simply by asking an LLM (Claude refuses less often than GPT4, GPT4 generally got better results when properly phrasing the request).One interesting find is that I wrote an integration with GPT4 for binaryninja and funnily enough when asking the LLM to rewrite a function into “its idiomatic equivalent, refactored and simplified without detail removal” and then asking it to find vulnerabilities, it cracked most of our joke-hack-me’s in a matter of minutes.Interesting learning: nearly all LLMs can’t really properly work with disassembled Rust binaries, I guess that’s because the output doesn’t exactly resemble the rust code like it’d do in C and C++.

评论 #37922439 未加载

tklinglolover 1 year ago

This is confusing - the reporter claims to have "crafted the exploit" using the info they got from Bard. So the hallucinated info was actionable enough to actually perform the/an exploit, even though the report was closed as bogus?

评论 #37905454 未加载

评论 #37905408 未加载

评论 #37905287 未加载

pengaruover 1 year ago

ChatGPT is the epitome of a useful idiot.