科技回声

One of my worry is my users submitting malicious prompts and getting my account banned for everybody.How do you ensure that the prompt doesn't violate LLM providers Terms of Service?Does this mean we have to call the text gen API twice? First to analyze and ask if it violates TOS and then second to finally fulfill the user's requests?Are there any other solutions that is working for you?

> Does this mean we have to call the text gen API twice? First to analyze and ask if it violates TOS and then second to finally fulfill the user's requests?This just means the user generated content gets sent to the API once with different framing (risking a ban or strike or whatever) and if it doesn't trigger your detection gets sent again with the normal framing (giving another chance at a provider ban, strike, etc.)Seems like that would just accelerate your ban by having you send each potentially-violating interaction twice, with slightly different context, giving more chances of a violation and possibly doubling violations for some content.You can probably do better at reducing your risk by running a local classifier (or a comparatively small local LLM) as your trouble detector, before deciding to send a request to the backend, though validating the trouble detector setup may be problematic.

You can use a local LLM like Llama 3, they're pretty good. Zero risk of getting banned by them.

Wouldn’t the “check” api call be enough to violate TOC?

You can use a local LLM like Llama 3, they're pretty good. Zero risk of getting banned by them.

Wouldn’t the “check” api call be enough to violate TOC?

Ask HN: How are you making sure LLM provider doesn't ban you from user prompts?

3 条评论

Ask HN: How are you making sure LLM provider doesn't ban you from user prompts?

3 条评论