TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: ArXiv-txt, LLM-friendly ArXiv papers

22 pointsby jerpint3 months ago
Just change arxiv.org to arxiv-txt.org in the URL to get the paper info in markdown<p>Example:<p>Original URL: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;1706.03762</a><p>Change to: <a href="https:&#x2F;&#x2F;arxiv-txt.org&#x2F;abs&#x2F;1706.03762" rel="nofollow">https:&#x2F;&#x2F;arxiv-txt.org&#x2F;abs&#x2F;1706.03762</a><p>To fetch the raw text directly, use <a href="https:&#x2F;&#x2F;arxiv-txt.org&#x2F;raw&#x2F;abs&#x2F;1706.03762" rel="nofollow">https:&#x2F;&#x2F;arxiv-txt.org&#x2F;raw&#x2F;abs&#x2F;1706.03762</a>, this will be particularly useful for APIs and agents

6 comments

westurner3 months ago
If you train an LLM on only formally verified code, it should not be expected to generate formally verified code.<p>Similarly, if you train an LLM on only published ScholarlyArticles [&#x27;s abstracts], it should not be expected to generate publishable or true text.<p>Traceability for Retraction would be necessary to prevent lossy feedback.
owalerys3 months ago
Really clean API design, I&#x27;m a fan!
lgas3 months ago
It just extracts the abstracts?
评论 #43113655 未加载
sbpost3 months ago
The example you give doesn&#x27;t seem to work - the raw txt does not have authors.
评论 #43122955 未加载
jmartin26833 months ago
This would be awesome wrapped in an MCP server&#x2F;tool call :)
评论 #43117774 未加载
cchance3 months ago
Was super excited that it was going to be the actual papers, kinda cool but just being abstracts doesn&#x27;t go very far, good luck getting the papers working thats gonna be pretty cool once working, then to feed it all into a vector db XD