TechEcho

6 comments

I don’t understand how the watermarking is supposed to affect the LLMs being trained. Isn’t the watermarking some hidden data in the PDF file itself and the text contained in the PDF is the same for everyone? And aren’t the LLMs trained with the extracted plain text, not the PDF itself? How would the LLM training be different if there’s a watermark?

评论 #42960412 未加载

keyle4 months ago

I think using the term backdoor is completely wrong in this case.You're not at risk for using LLM other than the content producer being able to tell that the LLM trained on their data; maybe, potentially... It's a stretch even.Your queries are not being relayed to them, they don't have a backdoor into the LLM content, algos or queries. They merely have a tainted marker, potentially showing, in the output.LLM providers can always make the claim that they didn't get the tainted data from the source, but got it from another source and that's the ones you should go after; good luck proving the misdirection. I bet it's probably even hard for them to know exactly where this exact output came from, since it's probably been re-ingurgitated 250,000 times.

评论 #42959945 未加载

评论 #42960422 未加载

fjjjrjj4 months ago

It's the Uber model. Grow first and figure out licensing, compliance, and lobbying second.

hprotagonist4 months ago

<a href="https://en.wikipedia.org/wiki/Canary_trap" rel="nofollow">https://en.wikipedia.org/wiki/Canary_trap</a>

pcranaway4 months ago

this is really interestingbut as much as i want to agree with the author’s stance:> [..] no matter how hard these companies try to sell us on AGI or “research” models, you can just laugh until you cry that they really thought they could steal from the well of knowledge and turn around and sell it back to us through SaaSi feel like at the end, these companies will still win

renewiltord4 months ago

Google stole from the well of knowledge and sold it back to us as search. Or rather, accessing a database efficiently is a problem of its own.Why are all these articles always so fraught with these excessively fearful terms? It’s not real, man. Humanity invented a new tool.And yeah, people freaked out about search engines too. And it was the same breathless terror. Get a grip.

评论 #42959955 未加载

6 comments

echoangle4 months ago

评论 #42960412 未加载

keyle4 months ago

评论 #42959945 未加载

评论 #42960422 未加载

fjjjrjj4 months ago

It's the Uber model. Grow first and figure out licensing, compliance, and lobbying second.

hprotagonist4 months ago

<a href="https://en.wikipedia.org/wiki/Canary_trap" rel="nofollow">https://en.wikipedia.org/wiki/Canary_trap</a>

pcranaway4 months ago

renewiltord4 months ago

评论 #42959955 未加载

LLMs Were Backdoored Years Ago

6 comments

LLMs Were Backdoored Years Ago

6 comments