TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

I added context data to the TruthfulQA dataset

1 点作者 roh26it9 个月前

1 comment

roh26it9 个月前
Being one of the most downloaded datasets on Huggingface, I was a little bit surprised by how dirty this dataset was. Plus it had very limited information and some incorrect classifications as well.<p>For an internal experiment on building a &quot;Truthful Evaluator&quot;, we picked up this dataset and tried fine-tuning a model on these 8000 odd examples.<p>Realised that it needed: 1. Cleaning up 2. Some reclassification<p>But, most importantly - it lacked context data. It only had a link pointing to the source which was also absent for a few rows.<p>We scraped the internet for the link in the dataset, matched it to the question and narrowed down on a small context to be added to the main dataset.<p>Releasing it publicly so that someone else may avoid the 2-3 days of pain of wrangling with this data.