TechEcho

9 comments

afilerabout 14 years ago

Prompted by downloading a .doc file from Qwest only to find out that inside was a monospaced text file, I set up a small, nearly UI-free site for doing document conversions. <a href="http://doc.mar.cx/<url&#62" rel="nofollow">http://doc.mar.cx/<url&#62</a>; gives an HTML or other sensible rendering of an url (e.g. <a href="http://doc.mar.cx/http://www.itu.int/dms_pub/itu-t/oth/02/02/T02020000010001MSWE.doc" rel="nofollow">http://doc.mar.cx/http://www.itu.int/dms_pub/itu-t/oth/02/02...</a> ) and <a href="http://doc.mar.cx/<extension>/<url&#62" rel="nofollow">http://doc.mar.cx/<extension>/<url&#62</a>; attempts to convert the url into the format with the given extension (e.g. <a href="http://doc.mar.cx/txt/http://www.itu.int/dms_pub/itu-t/oth/02/02/T02020000010001MSWE.doc" rel="nofollow">http://doc.mar.cx/txt/http://www.itu.int/dms_pub/itu-t/oth/0...</a> ).I use wvHtml for doc->html, wvPDF for doc->pdf, but antiword for doc->txt. To convert .docx, .xls, .xlsx, and WordPerfect files to HTML, I use OpenOffice, by way of jodconverter. For ODF files, I use OdfConverter. Conversion of Excel files to .csv files uses xls2csv. For PowerPoint files, I use ppthtml to convert to html, and catppt to convert to text. For Lotus 1-2-3 files (I added this after downloading some historical telecom data from the FCC!), I use ssconvert.Any conversion that results in an HTML file (e.g. doc or pdf to html) I bundle all the images into a single file using the data: url scheme. To do this, I wrote a utility called pagecan: <a href="http://afiler.com/pagecan/" rel="nofollow">http://afiler.com/pagecan/</a>

sushiabout 14 years ago

UX Suggestion: Please hyperlink the Blog text besides the Recruiterbox logo. It's underlined so users expect it to be a link.

评论 #2563516 未加载

评论 #2563664 未加载

bravuraabout 14 years ago

You should also consider 'pandoc', written in Haskell, for converting between markup formats: <a href="http://johnmacfarlane.net/pandoc/" rel="nofollow">http://johnmacfarlane.net/pandoc/</a>I am curious for more details about why Tika wasn't good enough. Please explain.

评论 #2564198 未加载

评论 #2564195 未加载

kalmi10about 14 years ago

Based on the title I expected some html5 magic for converting binary files into html in the browser.

tucosanabout 14 years ago

How about trying out calibre <a href="http://calibre-ebook.com" rel="nofollow">http://calibre-ebook.com</a> It can do all kinds of conversions from a number of formats, it is quite reliable, and it can be run headless.

dpapathanasiouabout 14 years ago

How would you compare abiword for doc/docx conversion versus antiword (<a href="http://www.winfield.demon.nl/" rel="nofollow">http://www.winfield.demon.nl/</a>)?Also, what are the limitations of abiword for doc/docx files?

评论 #2564209 未加载

jamesshamenskiabout 14 years ago

Million Dollar Question:How could you additionally parse the information to extract structured data? For example; names of candidates, addresses, previous employers, job titles held.

评论 #2564172 未加载

Jakobabout 14 years ago

Please add a candidate delete function. I sent an email with candidate with multiple attachments and Recruiterbox created multiple candidates by mistake.

nopalabout 14 years ago

There's really not much here.Could we see some code or a demo?

9 comments

afilerabout 14 years ago

sushiabout 14 years ago

UX Suggestion: Please hyperlink the Blog text besides the Recruiterbox logo. It's underlined so users expect it to be a link.

评论 #2563516 未加载

评论 #2563664 未加载

bravuraabout 14 years ago

评论 #2564198 未加载

评论 #2564195 未加载

kalmi10about 14 years ago

Based on the title I expected some html5 magic for converting binary files into html in the browser.

tucosanabout 14 years ago

dpapathanasiouabout 14 years ago

评论 #2564209 未加载

jamesshamenskiabout 14 years ago

Million Dollar Question:How could you additionally parse the information to extract structured data? For example; names of candidates, addresses, previous employers, job titles held.

评论 #2564172 未加载

Jakobabout 14 years ago

Please add a candidate delete function. I sent an email with candidate with multiple attachments and Recruiterbox created multiple candidates by mistake.

nopalabout 14 years ago

There's really not much here.Could we see some code or a demo?

HTML preview for doc, docx, pdf & rtf

9 comments

HTML preview for doc, docx, pdf & rtf

9 comments