Want to hear about your experiences in terms of cost from various providers of LLM APIs. For example, for my usecase, I don't need to have a real time response. Are there any services that exploit spot instances and such, where I can submit a set of queries and get responses after a few hours, but costs much less?
The secret sauce is using a big LLM to generate many “responses” for some given intents, then use phi or something cheap to do intent detection and pick one of the pre made responses.<p>You can generate 100s of responses per intent, so the user may not ever get the same response twice.<p>Ofc it depends on your use case, but smoke and mirrors are your friend.