While somewhat off-topic, I had an interesting experience highlighting the utility of GitHub's Copilot the other day. I decided to run Copilot on a piece of code functioning correctly to see if it would identify any non-existent issues. Surprisingly, it managed to pinpoint an actual bug. Following this discovery, I asked Copilot to generate a unit test to better understand the identified issue. Upon running the test, the program crashed just as Copilot had predicted. I then refactored the problematic lines as per Copilot's suggestions. This was my first time witnessing the effectiveness of Copilot in such a scenario, which provided small yet significant proof to me that Language Models can be invaluable tools for coding, capable of identifying and helping to resolve real bugs. Although they may have limitations, I believe any imperfections are merely temporary hurdles toward more robust coding assistants.
If not malicious, then this shows that there are people out there who don't quite know how much to rely on LLMs or understand the limits of their capabilities. It's distressing.
<i>> you did not find anything worthy of reporting. You were fooled by an AI into believing that.</i><p>The author's right. Reading the report I was stunned; the person disclosing the so-called vulnerability said:<p><i>> To replicate the issue, I have searched in the Bard about this vulnerability.</i><p>Does Bard clearly warn to never rely on it for facts? I know OpenAI says <i>"ChatGPT may give you inaccurate information"</i> at the start of each session.
There's a second wind to this story in the Mastodon replies. It sounds like the LLM appeared to be basing this output on a CVE that hadn't yet been made public, implying that it had access to text that wasn't public. I can't quite tell if that's an accurate interpretation of what I'm reading.<p>>> @bagder it’s all the weirder because they aren’t even trying to report a new vulnerability. Their complaint seems to be that detailed information about a “vulnerability” is public. But that’s how public disclosure works? And open source? Like are they going to start submitting blog posts of vulnerability analysis and ask curl maintainers to somehow get the posts taken down???<p>>> @derekheld they reported this before that vulnerability was made public though<p>>> @bagder oh as in saying the embargo was broken but with LLM hallucinations as the evidence?<p>>> @derekheld something like that yes
> I responsibly disclosed the information as soon as I found it. I believe there is a better way to communicate to the researchers, and I hope that the curl staff can implement it for future submissions to maintain a better relationship with the researcher community. Thank you!<p>… yeah…
I was curious how many bogus security reports big open source projects have. If you go to <a href="https://hackerone.com/curl/hacktivity" rel="nofollow noreferrer">https://hackerone.com/curl/hacktivity</a> and scroll down to ones marked as "Not-applicable" you can find some additional examples. No other LLM hallucinations, but some pretty poorly-thought out "bugs".
Perhaps not useful to the conversation, but I really wish that whomever coined the behavior as a 'hallucination' had consulted a dictionary first.<p>It's delusional, not hallucinated.<p>Delusions are the irrational holdings of false belief, especially after contrary evidence has been provided.<p>Hallucinations are false sensations or perceptions of things that do not exist.<p>May some influential ML person read this and start to correct the vocabulary in the field :)
So the reporter thinks that they were able to get accurate info about private details of a embargo'ed cve from Bard. If correct they would have found a cve in bard, not in curl.<p>In this case the curl maintainers can tell the details are made up and don't correspond to any cve.
I'm not sure why this is interesting. AI was asked to make a fake vulnerability and it did. That's the sort of thing these AIs are good at, not exactly new at this point.
I’m doing reverse engineering work every now and then and a year ago I’d have called myself a fool but I have found multiple exploitable vulnerabilities simply by asking an LLM (Claude refuses less often than GPT4, GPT4 generally got better results when properly phrasing the request).<p>One interesting find is that I wrote an integration with GPT4 for binaryninja and funnily enough when asking the LLM to rewrite a function into “its idiomatic equivalent, refactored and simplified without detail removal” and then asking it to find vulnerabilities, it cracked most of our joke-hack-me’s in a matter of minutes.<p>Interesting learning: nearly all LLMs can’t really properly work with disassembled Rust binaries, I guess that’s because the output doesn’t exactly resemble the rust code like it’d do in C and C++.
This is confusing - the reporter claims to have "crafted the exploit" using the info they got from Bard. So the hallucinated info was actionable enough to actually perform the/an exploit, even though the report was closed as bogus?