TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Pdf2htmlEX – Convert PDF to HTML without losing text or format

161 pointsby coolwangluabout 12 years ago

11 comments

chill1about 12 years ago
I've actually been using this to convert large PDF files to HTML to be displayed in-browser. It's for my work, so I don't feel comfortable posting a link to the demo instance here.<p>It is definitely the best solution I've found so far. The outputted HTML / CSS / images look almost identical to the source PDF. That being said, there are a few issues still:<p>* One Gigantic (600kb) CSS file from a single PDF<p>* Hundreds of individual fonts<p>* HTML semantics are non-existent<p>These are all relatively easy to fix, I believe. I have found my own solutions to most of the issues in post-processing.<p>Kudos to you, coolwanglu. Also, I'd like to get in touch with you about lending a hand to fix some of the issues I've encountered.<p>Thanks for a cool piece of software!
评论 #5661284 未加载
评论 #5660748 未加载
ComputerGuruabout 12 years ago
Can anyone recommend an equally good opposite (HTML to PDF)?<p>wkhtmltopdf [0] is probably the most popular, but it's also ridiculously buggy.<p>0: <a href="https://code.google.com/p/wkhtmltopdf/" rel="nofollow">https://code.google.com/p/wkhtmltopdf/</a>
评论 #5658687 未加载
评论 #5659066 未加载
评论 #5658816 未加载
评论 #5660853 未加载
评论 #5660242 未加载
评论 #5660096 未加载
评论 #5660924 未加载
评论 #5660925 未加载
AndreasFromabout 12 years ago
This works and displays correctly, but is unbearably slow on iPad 2 whereas the PDF loads instantly. What is the point then or does it work a lot better in desktop browsers?
评论 #5658213 未加载
评论 #5658162 未加载
评论 #5658083 未加载
评论 #5658122 未加载
crazygringoabout 12 years ago
Interesting. So it converts all vector graphics to a background image per page, but keeps all text as browser-rendered on top of it.<p>I guess I don't really see much practical purpose for it -- most browsers these days seem perfectly fine opening PDF files natively, after all. But it's a very cool technological demonstration.<p>Maybe this could be some kind of bridge tool for generating sites with fancy typographical layout? You could use Adobe Illustrator etc. to do fancy column work, drop caps, hyphenation, all that jazz -- and then "render" into HTML. It would certainly be as anti-"responsive" as you can get, but it would certainly have the ability to generate more advanced typography much faster than you can produce with HTML/CSS by hand.
评论 #5661222 未加载
评论 #5660363 未加载
评论 #5660753 未加载
dannyroughabout 12 years ago
I do this almost daily. I use a PDF converter driver found on the internet . Install it and it becomes a selectable converter option.Then you can convert PDFs to many forms in any program at all, including Adobe Acrobat . Just open a PDF, select convert, and choice a form you want, the task will be finished in several seconds. if you haven't found a good choice , you can have a try. best wishes. <a href="http://www.rasteredge.com/how-to/csharp-imaging/pdf-convert-html/" rel="nofollow">http://www.rasteredge.com/how-to/csharp-imaging/pdf-convert-...</a>
_DiskErrorabout 12 years ago
Question, does your public folder periodically delete files? I accidentally uploaded something confidential and it seems to be gone. I was wondering if this was a manual deletion or just expired since I still see files that were uploaded around the same time still there.
alcuadradoabout 12 years ago
Can't Mozilla's pdf.js be used to get the same result? Great results anyway!
评论 #5658677 未加载
chucknelsonabout 12 years ago
Promising start. Hopefully performance improves with each release.
评论 #5659190 未加载
Dnguyenabout 12 years ago
I didn't see any mention of tables in the doc. Does this means it's outside of the "good enough" range? Table extraction would be a great feature.
评论 #5659182 未加载
评论 #5663610 未加载
rcfoxabout 12 years ago
How did you manage to get Mediafire to host your demo?
评论 #5660760 未加载
estabout 12 years ago
路过拜大牛
评论 #5658884 未加载