TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Nougat: Neural Optical Understanding for Academic Documents

66 点作者 JohnHammersley超过 1 年前

5 条评论

yoz超过 1 年前
The paper (and examples) as HTML: <a href="https:&#x2F;&#x2F;facebookresearch.github.io&#x2F;nougat&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;facebookresearch.github.io&#x2F;nougat&#x2F;</a><p>Repo with code, including a CLI tool for converting a PDF to Mathpix Markdown: <a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;nougat">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;nougat</a>
评论 #37385239 未加载
w-m超过 1 年前
Trying to come to this discussion I accidentally misclicked and ended up in OPs profile, to learn that he’s a cofounder of Overleaf (hi John, big fan of the product). Overleaf, which has the purpose of turning markup language into PDFs, whereas Nougat turns PDFs into markup language. Made me wonder how many of these inverse&#x2F;symmetrical&#x2F;twin products there are, where one turns foos into bars and the other turning bars back into foos.
评论 #37385254 未加载
评论 #37384649 未加载
funnym0nk3y超过 1 年前
Having this side by side with the original scan and a highlight where the text is found in the original would be an amazing product. I hated scrolling through scans of old literature in university...
dmnsl超过 1 年前
New training data source for LLM just dropped
评论 #37385426 未加载
评论 #37384128 未加载
评论 #37384533 未加载
29athrowaway超过 1 年前
tl;dr: it translates images of papers into TeX documents.
评论 #37384538 未加载