I find it very difficult to have agent run reliably in my test env, let alone prod. The un-determinism of LLM got multiplied when running in multistep agents or multi agents. Anyone has experience where it works well consistently?
I think the idea behind agents is to have them a bit less constrained.
If you can force input/output with better prompts that might help.
Or if you can reduce the temperature of the underlying LLM generation, that could make it a bit more deterministic.<p>Or switch to using a chain.