Almost all of this should flow from common-sense. I would use what makes sense for your application, and not worry about the rest. It's a toolbox, not a rulebook. The one point that comes more from experience than from common-sense is to always pin your model versions. As a final tip, if despite trying everything, you still don't like the LLM's output, just run it again!<p>Here is a summary of all points:<p>1. Focus on Prompting Techniques:<p><pre><code> 1.1. Start with n-shot prompts to provide examples demonstrating tasks.
1.2. Use Chain-of-Thought (CoT) prompting for complex tasks, making instructions specific.
1.3. Incorporate relevant resources via Retrieval Augmented Generation (RAG).
</code></pre>
2. Structure Inputs and Outputs:<p><pre><code> 2.1. Format inputs using serialization methods like XML, JSON, or Markdown.
2.2. Ensure outputs are structured to integrate seamlessly with downstream systems.
</code></pre>
3. Simplify Prompts:<p><pre><code> 3.1. Break down complex prompts into smaller, focused ones.
3.2. Iterate and evaluate each prompt individually for better performance.
</code></pre>
4. Optimize Context Tokens:<p><pre><code> 4.1. Minimize redundant or irrelevant context in prompts.
4.2. Structure the context clearly to emphasize relationships between parts.
</code></pre>
5. Leverage Information Retrieval/RAG:<p><pre><code> 5.1. Use RAG to provide the LLM with knowledge to improve output.
5.2. Ensure retrieved documents are relevant, dense, and detailed.
5.3. Utilize hybrid search methods combining keyword and embedding-based retrieval.
</code></pre>
6. Workflow Optimization:<p><pre><code> 6.1. Decompose tasks into multi-step workflows for better accuracy.
6.2. Prioritize deterministic execution for reliability and predictability.
6.3. Use caching to save costs and reduce latency.
</code></pre>
7. Evaluation and Monitoring:<p><pre><code> 7.1. Create assertion-based unit tests using real input/output samples.
7.2. Use LLM-as-Judge for pairwise comparisons to evaluate outputs.
7.3. Regularly review LLM inputs and outputs for new patterns or issues.
</code></pre>
8. Address Hallucinations and Guardrails:<p><pre><code> 8.1. Combine prompt engineering with factual inconsistency guardrails.
8.2. Use content moderation APIs and PII detection packages to filter outputs.
</code></pre>
9. Operational Practices:<p><pre><code> 9.1. Regularly check for development-prod data skew.
9.2. Ensure data logging and review input/output samples daily.
9.3. Pin specific model versions to maintain consistency and avoid unexpected changes.
</code></pre>
10. Team and Roles:<p><pre><code> 10.1. Educate and empower all team members to use AI technology.
10.2. Include designers early in the process to improve user experience and reframe user needs.
10.3. Ensure the right progression of roles and hire based on the specific phase of the project.
</code></pre>
11. Risk Management:<p><pre><code> 11.1. Calibrate risk tolerance based on the use case and audience.
11.2. Focus on internal applications first to manage risk and gain confidence before expanding to customer-facing use cases.</code></pre>