TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: FiddleCube – Generate Q&A to test your LLM

78 点作者 kaushik9211 个月前
Convert your vector embeddings into a set of questions and their ideal responses. Use this dataset to test your LLM and catch failures caused by prompt or RAG updates.<p>Get started in 3 lines of code:<p>```<p>pip3 install fiddlecube<p>```<p>```<p>from fiddlecube import FiddleCube<p>fc = FiddleCube(api_key=&quot;&lt;api-key&gt;&quot;) dataset = fc.generate( [ &quot;The cat did not want to be petted.&quot;, &quot;The cat was not happy with the owner&#x27;s behavior.&quot;, ], 10, ) dataset<p>```<p>Generate your API key: <a href="https:&#x2F;&#x2F;dashboard.fiddlecube.ai&#x2F;api-key">https:&#x2F;&#x2F;dashboard.fiddlecube.ai&#x2F;api-key</a><p># Ideal QnA datasets for testing, eval and training LLMs<p>Testing, evaluation or training LLMs requires an ideal QnA dataset aka the golden dataset.<p>This dataset needs to be diverse, covering a wide range of queries with accurate responses.<p>Creating such a dataset takes significant manual effort.<p>As the prompt or RAG contexts are updated, which is nearly all the time for early applications, the dataset needs to be updated to match.<p># FiddleCube generates ideal QnA from vector embeddings<p>- The questions cover the entire RAG knowledge corpus.<p>- Complex reasoning, safety alignment and 5 other question types are generated.<p>- Filtered for correctness, context relevance and style.<p>- Auto-updated with prompt and RAG updates.

7 条评论

Loic11 个月前
For the people wondering, the Github repo is only hosting a couple of lines of Python to connect to their API.<p>If you have your own LLM, you may have sensitive&#x2F;private data &quot;in&quot; it from your training. You may not be allowed to use this service from a legal point of view.
评论 #40801362 未加载
mistercow11 个月前
The bulleted list of what constitutes “ideal” is missing one of the most important types of questions: questions that <i>aren’t</i> answered by the knowledge set, but which seem like they should&#x2F;might be.<p>This is where RAG systems consistently fall down. The end user, by definition, doesn’t know what you’ve got in your data. They won’t ask questions carefully cherry-picked from it. They’ll ask questions they need to know the answer to, and more often than you think, those answers won’t be in your data. You absolutely must know how your system behaves when they do that.
评论 #40801381 未加载
johnsutor11 个月前
How does this differ from Ragas? <a href="https:&#x2F;&#x2F;docs.ragas.io&#x2F;en&#x2F;latest&#x2F;index.html">https:&#x2F;&#x2F;docs.ragas.io&#x2F;en&#x2F;latest&#x2F;index.html</a>
评论 #40792116 未加载
cruxcode11 个月前
Can it generate HTML as part of prompt?
评论 #40791642 未加载
praveenkumarnew11 个月前
Can I plug this into ragas pipeline
评论 #40822407 未加载
aditikothari11 个月前
This is super cool!
arjun964211 个月前
I want to hack