TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

DeepSeekMath 7B achieved 51.7% on MATH benchmark

114 点作者 mdp超过 1 年前

4 条评论

godelski超过 1 年前
Does anyone know how much spoilage are in these datasets? Common crawl has a lot of websites in it, including Reddit and Stack*. I&#x27;m certain there are lots of questions in those datasets and we want to differentiate recall from problem solving (often confused). I have a deep distrust when using large datasets like this given a common one with 60 authors assumed writing leet code style programs by hand would mean they wouldn&#x27;t appear in the training data (github) and didn&#x27;t even bother to check. It&#x27;s really hard to sanitize datasets of this size and deduplication is a much harder task than many realize.<p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2107.03374" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2107.03374</a><p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2303.09540" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2303.09540</a>
评论 #39281787 未加载
评论 #39281587 未加载
rgbrgb超过 1 年前
Supports commercial use!<p>Interesting what&#x27;s unsupported:<p>- In any way that violates any applicable national or international law or regulation or infringes upon the lawful rights and interests of any third party;<p>- For military use in any way;<p>- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;<p>- To generate or disseminate verifiably false information and&#x2F;or content with the purpose of harming others;<p>- To generate or disseminate inappropriate content subject to applicable regulatory requirements;<p>- To generate or disseminate personal identifiable information without due authorization or for unreasonable use;<p>- To defame, disparage or otherwise harass others;<p>- For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;<p>- For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;<p>- To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;<p>- For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories.
评论 #39278491 未加载
评论 #39280525 未加载
deepseekfake超过 1 年前
I have spoken to team members, and they all say the results of this and coder are very, very much leakage (no suprisse given the result!!)
评论 #39278674 未加载
评论 #39279130 未加载
评论 #39279303 未加载
评论 #39280120 未加载
评论 #39284617 未加载
评论 #39281318 未加载
mdp超过 1 年前
Related paper - <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2402.03300.pdf" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;pdf&#x2F;2402.03300.pdf</a>