Hi HN,<p>I've broken my head on this, and haven't found a a reliable way to programmatically convert documents (doc, docx, pdf etc) to HTML. The only option seems open-office as a server - but this keeps crashing (at least once a day). I would like something that can process thousands of docs per day and not crash. Any one here has faced this problem / knows a solution?<p>[ PS: In case you're wondering why, we run a web app for recruiting ( recruiterbox.com ) which requires converting resumes to html ]
I've never used it, but the Google Docs API fit your requirements:<p><a href="http://code.google.com/apis/documents/" rel="nofollow">http://code.google.com/apis/documents/</a><p>It accepts doc, docx, and pdf and does export to HTML. I'm unsure about what the API rate limit is, though. The FAQs suggest that it can be raised by using a premier account.
Document conversion is a tricky space for a startup. All the rules are defined by companies who would very much like to see you fail, and code wise it's almost the most boring task I can think of.