One of my worry is my users submitting malicious prompts and getting my account banned for everybody.<p>How do you ensure that the prompt doesn't violate LLM providers Terms of Service?<p>Does this mean we have to call the text gen API twice? First to analyze and ask if it violates TOS and then second to finally fulfill the user's requests?<p>Are there any other solutions that is working for you?
> Does this mean we have to call the text gen API twice? First to analyze and ask if it violates TOS and then second to finally fulfill the user's requests?<p>This just means the user generated content gets sent to the API once with different framing (risking a ban or strike or whatever) and if it doesn't trigger your detection gets sent again with the normal framing (giving another chance at a provider ban, strike, etc.)<p>Seems like that would just accelerate your ban by having you send each potentially-violating interaction twice, with slightly different context, giving more chances of a violation and possibly doubling violations for some content.<p>You can probably do better at reducing your risk by running a local classifier (or a comparatively small local LLM) as your trouble detector, before deciding to send a request to the backend, though validating the trouble detector setup may be problematic.