TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What's the best document parsing tool/SDK that you've heard of?

1 点作者 voiceclonr超过 6 年前
I am looking to parse various documents (docx,ppt,pdf,pst etc), extract metadata, text etc for search. I'm looking into Apache Tika - but my gut tells me a native windows tool may be better long term. Can anyone refer to tools/SDK they've used or heard to be successful ?

1 comment

mindcrime超过 6 年前
Tika is what we use. It's not perfect, but it works pretty well for our purposes.