I'm curious how this is done effectively assuming no source should be sent to a model hosted remotely. Are there foundational model toggles for using inference and embedding but not using data for training?
All the big providers offer no-training/retention guarantees (either by default, or as a toggle, or upon request). For many high security environments though I'd expect everything to be hosted on-prem or at minimum on company-controlled instances, which does limit your model options somewhat.<p>My employer has such contracts for some use cases, but actually forbids use of code completion/generation due to IP concerns.
I’ve worked with companies that would never trust publicly hosted models. But don’t have any issues with hosted models on AWS or Azure. But I work in cloud consulting so they already have to trust the cloud provider.<p>Yes this includes GovCloud implementations that have citizenship requirements and you can’t connect outside of the US.<p>I have not admittedly worked on any projects in the “secret” regions.<p><a href="https://aws.amazon.com/federal/secret-cloud/" rel="nofollow">https://aws.amazon.com/federal/secret-cloud/</a>
As someone else stated, there are enterprise services that offer solutions that make it so your company data isn't consumed, however, I think pretty soon we're going to see a lot of companies maintaining models locally in-house.<p>I think this is especially true given that Intel is shifting its focus toward an affordable in-house solution for training AI models locally with its upcoming GPUs.
Repeating what others have written based on my experience at the bank i work for- business offering will not use or save you data, and for more sensitive material we simply host it on prem