Hi guys,<p>I'm needing a tool that allows me to convert PDF to html files. Since I work with public documents, sometimes the layout from the pdf can be pretty nasty (i've attached some links at the end of this post).<p>We have a in house soluction forked several years ago from Apache pdfBox. After a while we realized that forking a open source solution isnt the best answer, but kept on going because it worked.<p>Does anyone have sugestions? We are willing to contribute to the open source project we choose :)<p>Many thanks!<p>https://www.evernote.com/shard/s226/sh/17b87c1f-8f18-4b23-96ac-a9fbc2ac8502/ea5618043f3a9c818071bd93df9f74c3<p>https://www.evernote.com/shard/s226/sh/17b87c1f-8f18-4b23-96ac-a9fbc2ac8502/ea5618043f3a9c818071bd93df9f74c3
I've had good luck with the tools that come with xpdf:<p><a href="http://www.foolabs.com/xpdf/about.html" rel="nofollow">http://www.foolabs.com/xpdf/about.html</a><p>But some of that is because the source I was pulling text from didn't change the document format much from month to month.<p>I guess it is the library underneath jeffmould's link.
I have used the following with some success:<p><a href="http://pdftohtml.sourceforge.net/" rel="nofollow">http://pdftohtml.sourceforge.net/</a><p>Not sure how well maintained it is still, but it did a good job of converting basic PDF files to HTML.<p>There is also a Google Code product for going from HTML to PDF which works pretty well.