Saving a webpage to a PDF is literally one command line away:<p>chromium --headless --disable-gpu --print-to-pdf=google.pdf <a href="http://google.com/" rel="nofollow">http://google.com/</a><p>What does Apify add in this case?
I've been working on something similar: <a href="https://www.prerender.cloud/docs/api" rel="nofollow">https://www.prerender.cloud/docs/api</a><p><pre><code> // URL to screenshot
service.prerender.cloud/screenshot/https://www.google.com/
// URL to pdf
service.prerender.cloud/pdf/https://www.google.com/
// URL to html (prerender)
service.prerender.cloud/https://www.google.com/</code></pre>
By the way, is there an opposite service that converts PDF's into plain HTML for reading? I know about <a href="https://www.arxiv-vanity.com/papers/" rel="nofollow">https://www.arxiv-vanity.com/papers/</a> but it only works on arXiv PDFs.
Off-topic, but Apify as a service looks really good. I was spinning up a dedicated VM on AWS with Docker installed only to get a simple webscraper running. Apify solves this elegantly and removes an significant pain in my workflow.
I built Screen.rip, which also supports PDF generation. <a href="https://screen.rip/#pdf" rel="nofollow">https://screen.rip/#pdf</a><p>Screen.rip gives you more control over the generated PDF beyond Puppeteer's options (like it can wait for certain elements to appear, inject CSS or switch to screen stylesheet instead of the print stylesheet).
I love this service! I think ease of adoption you can allow pre-made scripts to be shared so the non-technical can easily set up work flows that go right into their email. For the technical folks, I think it would be great to have examples of things you can do with Apify that is a hassle to do with your local chrome headless.<p>Great job!
If you're interested in running your own personal Way-Back machine that uses Chrome headless for archiving (among other methods), check out Bookmark Archiver.<p><a href="https://github.com/pirate/bookmark-archiver" rel="nofollow">https://github.com/pirate/bookmark-archiver</a>
We are not too happy with our EvoPDF license so in the basis this is a good option. However, I do not think this allows adding headers, footers, page numbers etc.
Is there a similar API around that accepts HTML instead of a URL? I’ve build one for my project, but I would prefer to delegate this to an external service.
As long as GPU support is not functional in headless, "any web page" is a misnomer. A large enough percentage of sites use GPU acceleration so that headless mode is useless. This needs to be addressed by the Chrome team.
Does this work if the page is behind a password/SSO wall?<p>And is it possible to print multiple Chrome tabs?<p>Printing pages to PDF is pretty straightforward. It's the above two issues were I've run into problems. Anyone know of a good solution to the second one?