I've used many different moethods, including using Calibre, k2optpdf, and even OCR'ing the pages but none of it is really straightforward. It all takes considerable time, and linebreaks and paragraphs are my personal hell.<p>What's the easiest way to go about doing this? Even converting it to a simple text file with proper line breaks and paragraphs would take this problem a long way.<p>Consider that you have a pdf file with margins, a common font, the name of the book/chapter on the top/bottom gutters and the page number. Parsing the contents page would be great but it is not necessary.