Hi HN,<p>I wanted to share something I’ve been working on over the past month and would love to hear your thoughts.<p>It’s a simple tool that converts PDFs to text while preserving layout elements like tables and headings, powered by LLMs.<p>You can try it with 500 free credits (no signup required) and get 1000 more if you sign up.<p>Would really appreciate any feedback especially from those who’ve struggled with messy PDF extractions before!<p>Thanks for checking it out.
More context on how I did it:
I used tesseractOCR to get the extraction and the layout analysis, then I passed the results as text to GPT4 api to possibly fix any misreadings based on completion.
For example, if the text from OCR says "the couples were walking across the shiver", the LLM fixes it and makes it "across the river".