TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

LLM powered PDF to Word conversion with layout preservation

1 pointsby aymaneSennoussi2 months ago

2 comments

aymaneSennoussi2 months ago
Hi HN,<p>I wanted to share something I’ve been working on over the past month and would love to hear your thoughts.<p>It’s a simple tool that converts PDFs to text while preserving layout elements like tables and headings, powered by LLMs.<p>You can try it with 500 free credits (no signup required) and get 1000 more if you sign up.<p>Would really appreciate any feedback especially from those who’ve struggled with messy PDF extractions before!<p>Thanks for checking it out.
aymaneSennoussi2 months ago
More context on how I did it: I used tesseractOCR to get the extraction and the layout analysis, then I passed the results as text to GPT4 api to possibly fix any misreadings based on completion. For example, if the text from OCR says &quot;the couples were walking across the shiver&quot;, the LLM fixes it and makes it &quot;across the river&quot;.