> Currently, the Huggingface Hub provides model publishers the option of requiring pre-registration and/or pre-approval to download a specific model’s weights. However, downstream (e.g., finetuned) or even direct versions of these models are not required to enforce these controls, making them easy to circumvent. We would encourage Huggingface and other model distributors to enforce that such controls propagate downstream, including automated enforcement of this requirement (e.g., via automated checks of model similarity).<p>None of the watermarking methods I have seen work in this way. All of them require extra work at inference time. In other words, Gemini might have watermarking technology on top of their model, but if I could download the weights I could simply choose not to watermark my text.<p>Stepping back, in section 6 the authors don’t address what I see as the main criticism: authentication via writing style is extremely weak and none of the mitigation methods actually work. If you want to prevent phishing attacks I would suggest the most salient factor is the identity of the sender, not the style of writing of the email itself.<p>Another thing that annoys me about these “safety” people is they ignore the reality of running ML models. Getting around their “safeguards” is trivial. Maybe you think it is “unsafe” to talk about certain Tiananmen Square events. Whatever change you make to a model to mitigate this risk can be quite easily reversed using the same personalization methods the paper discusses.