TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Open-source tool helps you convert PDF documents, web pages, etc., into Markdown

61 点作者 Moon_Y9 个月前

2 条评论

h-jones9 个月前
Anyone know how this compares to GROBID [1]? I&#x27;m looking at alternatives to GROBID as I&#x27;m not super pleased with its outputs. GROBID has a lot of great features for journal papers (reference extraction &#x2F; parsing), but I&#x27;m only interested in cleanly extracting the body. Also considering nougat [2] but I haven&#x27;t tried it yet.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;kermitt2&#x2F;grobid">https:&#x2F;&#x2F;github.com&#x2F;kermitt2&#x2F;grobid</a><p>[2] <a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;nougat">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;nougat</a>
评论 #41286472 未加载
oliverkwebb9 个月前
Nice tool, I&#x27;ve been using html2md[1] and such. It&#x27;s written in python and in beta so it&#x27;s probably not the best for processing static sites and such. But still useful<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;suntong&#x2F;html2md">https:&#x2F;&#x2F;github.com&#x2F;suntong&#x2F;html2md</a>