科技回声

11 条评论

The proliferation of harmful AI capabilities has likely already occurred. Its naive to not accept this reality when tackling this. A more realistic approach would be to focus on building more robust systems for detection, attribution, and harm mitigation.The paper makes some good points - it doesn't take a lot of data to convincingly emulate a writing style (~75 emails) and there is a significant gap in legislation as most US "deepfake" legislation explicitly excludes text and focuses heavily on image/video/audio

评论 #43018469 未加载

ThinkBeat3 个月前

I have the dread that in a relatively short amount of time the personal traits in peoples writing will over time decrease and may essentially end.All our written interaction will have been polished and enhanced by one LLM or another into a uniform template.

评论 #43019428 未加载

评论 #43019197 未加载

评论 #43018630 未加载

评论 #43021861 未加载

评论 #43018968 未加载

Hizonner3 个月前

Most readers aren't attentive enough to notice a person's style enough that you have to resort to an LLM to fake it.

pizza3 个月前

I guess there's 3 players in the games that often get invoked in discussions about AI cogsec:- the corporation running the AI- the user- the AI, sometimes - depending on the particular conversation's ascription of agency to the AIIt seems the downstream harms are of 2 kinds:- social engineering to give up otherwise secured systems- 'people engineering' - the kind of thing people complain about when it comes to recommender systems. "Mind control", basically, y'know. [0]Things like r/LocalLlama and the general "tinkerer" environment of open source AI makes me wonder if it wouldn't be rather trivial in some sense for users to build the same capabilities but for personal protection from malign influences. Like a kind of user-controlled "debrief AI". But then, of course, you might get a superintelligence that can pretend to be your friend but is actually more like Iago from Othello.But is that really a likelihood in a situation where the user can make their own RLHF dataset and fit it to the desired behavior? Generally I'd expect the user to get the behavior they were looking for. Plus, like immune system memory, people could continually train new examples of sleights into it. I guess maybe there could be a hint of the "Waluigi problem", perhaps.[0] I think it does the people who are distressed about it a disservice to saturate all their news channels with reports about how they are utterly incapable of outwitting an algorithmic super intelligence. But that's a different discussion altogether

评论 #43019147 未加载

janalsncm3 个月前

> Currently, the Huggingface Hub provides model publishers the option of requiring pre-registration and/or pre-approval to download a specific model’s weights. However, downstream (e.g., finetuned) or even direct versions of these models are not required to enforce these controls, making them easy to circumvent. We would encourage Huggingface and other model distributors to enforce that such controls propagate downstream, including automated enforcement of this requirement (e.g., via automated checks of model similarity).None of the watermarking methods I have seen work in this way. All of them require extra work at inference time. In other words, Gemini might have watermarking technology on top of their model, but if I could download the weights I could simply choose not to watermark my text.Stepping back, in section 6 the authors don’t address what I see as the main criticism: authentication via writing style is extremely weak and none of the mitigation methods actually work. If you want to prevent phishing attacks I would suggest the most salient factor is the identity of the sender, not the style of writing of the email itself.Another thing that annoys me about these “safety” people is they ignore the reality of running ML models. Getting around their “safeguards” is trivial. Maybe you think it is “unsafe” to talk about certain Tiananmen Square events. Whatever change you make to a model to mitigate this risk can be quite easily reversed using the same personalization methods the paper discusses.

potato37328423 个月前

The risk is not that one cannot forge correspondance in the style of another.The risk is that any one of us peasants can do it without having to have a bunch of other people in on it

评论 #43015355 未加载

评论 #43015479 未加载

lolc3 个月前

I find this whole discussion tiring. Really you go after the fraudsters and their enablers. They're in another jurisdiction? Who moves the money for them? Go after those people.Look this is not fancy work. But writing about limiting models is just useless when it comes to fraud prevention. Why are we talking about watermarking models? Because it's easier than doing the hard work of policing money flows.

Zak3 个月前

I don't think the important risk of efficient personalized text generation is impersonation as the article claims.Humanity has already seen harmful effects from social media algorithms that efficiently identify content a person can't turn away from even if they consciously want to. The prospect of being able to efficiently generate media that will be maximally persuasive to each individual viewer on any given issue is terrifying.

评论 #43017059 未加载

orbital-decay3 个月前

Let's just concentrate this ability in the hands of few, so they can do it responsibly.

binary1323 个月前

I’m far from convinced these hackernews threads on this stuff aren’t heavily botted.

mediumsmart3 个月前

the holy grail of course is adapting the text of the novel to the readers mood in realtime via smartwatch monitor and making them go and buy something before they reach the end of the chapter.

11 条评论

Jimmc4143 个月前

评论 #43018469 未加载

ThinkBeat3 个月前

评论 #43019428 未加载

评论 #43019197 未加载

评论 #43018630 未加载

评论 #43021861 未加载

评论 #43018968 未加载

Hizonner3 个月前

Most readers aren't attentive enough to notice a person's style enough that you have to resort to an LLM to fake it.

pizza3 个月前

评论 #43019147 未加载

janalsncm3 个月前

potato37328423 个月前

The risk is not that one cannot forge correspondance in the style of another.The risk is that any one of us peasants can do it without having to have a bunch of other people in on it

评论 #43015355 未加载

评论 #43015479 未加载

lolc3 个月前

Zak3 个月前

评论 #43017059 未加载

orbital-decay3 个月前

Let's just concentrate this ability in the hands of few, so they can do it responsibly.

binary1323 个月前

I’m far from convinced these hackernews threads on this stuff aren’t heavily botted.

mediumsmart3 个月前

the holy grail of course is adapting the text of the novel to the readers mood in realtime via smartwatch monitor and making them go and buy something before they reach the end of the chapter.

Time to act on the risk of efficient personalized text generation

11 条评论

Time to act on the risk of efficient personalized text generation

11 条评论