TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: ArXiv-txt, LLM-friendly ArXiv papers

22 点作者 jerpint3 个月前
Just change arxiv.org to arxiv-txt.org in the URL to get the paper info in markdown<p>Example:<p>Original URL: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762</a><p>Change to: <a href="https:&#x2F;&#x2F;arxiv-txt.org&#x2F;abs&#x2F;1706.03762" rel="nofollow">https:&#x2F;&#x2F;arxiv-txt.org&#x2F;abs&#x2F;1706.03762</a><p>To fetch the raw text directly, use <a href="https:&#x2F;&#x2F;arxiv-txt.org&#x2F;raw&#x2F;abs&#x2F;1706.03762" rel="nofollow">https:&#x2F;&#x2F;arxiv-txt.org&#x2F;raw&#x2F;abs&#x2F;1706.03762</a>, this will be particularly useful for APIs and agents

6 条评论

westurner3 个月前
If you train an LLM on only formally verified code, it should not be expected to generate formally verified code.<p>Similarly, if you train an LLM on only published ScholarlyArticles [&#x27;s abstracts], it should not be expected to generate publishable or true text.<p>Traceability for Retraction would be necessary to prevent lossy feedback.
owalerys3 个月前
Really clean API design, I&#x27;m a fan!
lgas3 个月前
It just extracts the abstracts?
评论 #43113655 未加载
sbpost3 个月前
The example you give doesn&#x27;t seem to work - the raw txt does not have authors.
评论 #43122955 未加载
jmartin26833 个月前
This would be awesome wrapped in an MCP server&#x2F;tool call :)
评论 #43117774 未加载
cchance3 个月前
Was super excited that it was going to be the actual papers, kinda cool but just being abstracts doesn&#x27;t go very far, good luck getting the papers working thats gonna be pretty cool once working, then to feed it all into a vector db XD