TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: CLI for generating PDFs for offline reading

161 点作者 dvcoolarun超过 1 年前
I&#x27;ve always thought that extensive reading was best suited for the realm of paper. As a result, I&#x27;ve created a command-line interface (CLI) tailored for my own use and decided to make it open source. I welcome any feedback you may have.<p>[Edit] Sample PDF :: <a href="https:&#x2F;&#x2F;drive.google.com&#x2F;file&#x2F;d&#x2F;1n7M1TKOptSsYiibrbvV_Yojx53TK3k5E&#x2F;view" rel="nofollow">https:&#x2F;&#x2F;drive.google.com&#x2F;file&#x2F;d&#x2F;1n7M1TKOptSsYiibrbvV_Yojx53T...</a>

20 条评论

ComputerGuru超过 1 年前
I feel like if you are claiming &quot;beautiful&quot; output then it&#x27;s obligatory to have at the very least screenshots of said output PDFs (or better yet, a sample for the same link in the CLI screenshot, especially so people can see how the text flows, what quality images are captured at, how text can be selected, etc).
评论 #39266931 未加载
评论 #39266174 未加载
jackconsidine超过 1 年前
This is cool! I have a HN pipeline where I upvote things that I want to drill into, and a script I wrote generates PDFs and sends to my Kindle for offline reading (great for my pipeline). That uses Playwright&#x27;s &quot;to PDF&quot; method which is over the browser and slow. I might look into replacing with this.<p>If there&#x27;s any interest I might OSS the pipeline
评论 #39271632 未加载
评论 #39266710 未加载
评论 #39276437 未加载
nacho2sweet超过 1 年前
We just use a headless chrome with a sort of wrapper script to do this at my work with a bunch of settings close to the actual size of paper. It allows me to test all of our reports in media-&gt;print in dev tools then print-&gt;pdf with chrome and only have to design to that spec. Then in our reports we provide a &quot;save as pdf&quot; button instead of encouraging print in all the other possible browsers which would make the task insane and cause me to possibly quit.
dvcoolarun超过 1 年前
Apologies for the oversight; I forgot to include the screenshot of the sample PDF. Here it is for your reference: <a href="https:&#x2F;&#x2F;drive.google.com&#x2F;file&#x2F;d&#x2F;1n7M1TKOptSsYiibrbvV_Yojx53TK3k5E&#x2F;view" rel="nofollow">https:&#x2F;&#x2F;drive.google.com&#x2F;file&#x2F;d&#x2F;1n7M1TKOptSsYiibrbvV_Yojx53T...</a>
评论 #39266349 未加载
dvcoolarun超过 1 年前
Arr, this blew up! I think, in some form, people are missing the context of the script. It&#x27;s a plug-and-play script where you can make changes to PDF quality using CSS&#x2F;Python. Even fonts are loaded through Google in Python. &#x27;Beautiful&#x27; is called contextual. You can create your own version and share it with the community.<p>I&#x27;m on mobile, so I can&#x27;t add a Google Drive file screenshot to the readme, and iframes are not supported.
pavs超过 1 年前
like this:<p><pre><code> sudo apt install pandoc wkhtmltopdf npm install -g readability-cli pandoc -s https:&#x2F;&#x2F;www.paulgraham.com&#x2F;avg.html -o output.html &amp;&amp; readable output.html -o readable.html &amp;&amp; wkhtmltopdf readable.html output.pdf &amp;&amp; open output.pdf </code></pre> going even further using bash script to prompt for url.<p><pre><code> #!&#x2F;bin&#x2F;bash # Prompt the user for a URL read -p &quot;Enter the URL: &quot; URL # Use the URL in the pandoc command pandoc -s $url -o output.html &amp;&amp; readable output.html -o readable.html &amp;&amp; wkhtmltopdf readable.html output.pdf &amp;&amp; open output.pdf chmod +x web2pdf.sh # add an alias to bashrc alias web2pdf=&#x27;&#x2F;path&#x2F;to&#x2F;your&#x2F;web2pdf.sh&#x27; source ~&#x2F;.bashrc</code></pre>
评论 #39271230 未加载
评论 #39268676 未加载
评论 #39281609 未加载
评论 #39268501 未加载
seabass-labrax超过 1 年前
Very interesting! One piece of feedback: it would probably be more useful to have a screenshot of the PDF on your README rather than one of the CLI. Also, do you intend to release this as FOSS?
adrian_b超过 1 年前
Both Chrome and Firefox have absolutely horrible &quot;Print&quot; (to PDF) commands, which render the Web pages in a different way than what they show on the screen, and which results in large parts of the page being obscured by ads, menus, headers, etc., or in parts of the Web page that are outside the rendered area, so they are missing, or in content that is compressed to a small part of the output pages.<p>It would be really nice if there existed a utility able to produce a PDF file where the Web pages are rendered as well as the browsers render them on the screen, without becoming confused even by complex scripts loaded by the page.<p>The alternatives to &quot;Print&quot; (producing a PDF) are even worse. A screenshot has limited resolution and it loses the text. In the past &quot;Save as ...&quot; was the normal solution, but now even if you save a &quot;complete&quot; page, it will still frequently include scripts that will no longer work offline. What I want to save are the pages perfectly rendered as they were at that instant, without any scripts that could make them appear differently in the future.
评论 #39273684 未加载
Someone超过 1 年前
FTA: <i>“Then you can use the tool as follows<p><pre><code> pipenv shell pipenv install python main.py https:&#x2F;&#x2F;www.paulgraham.com&#x2F;avg.html, https:&#x2F;&#x2F;www.paulgraham.com&#x2F;determination.html </code></pre> Just add the webpage URLs separated by commas”</i><p>What’s the rationale for “separated by commas”? The convention for CLI arguments is to use one argument per input file.
评论 #39267952 未加载
jll29超过 1 年前
<p><pre><code> % python main.py https:&#x2F;&#x2F;www.paulgraham.com&#x2F;avg.html Traceback (most recent call last): File &quot;&#x2F;Users&#x2F;bill&#x2F;web2pdf&#x2F;main.py&quot;, line 7, in &lt;module&gt; from readability import Document ImportError: cannot import name &#x27;Document&#x27; from &#x27;readability&#x27; (&#x2F;Users&#x2F;bill&#x2F;.local&#x2F;share&#x2F;virtualenvs&#x2F;web2pdf- gXeVRXKg&#x2F;lib&#x2F;python3.9&#x2F;site-packages&#x2F;readability&#x2F;__init__.py) </code></pre> But according to your Pipfile.lock, the readability module needed is 0.3.1:<p><pre><code> &quot;readability&quot;: { &quot;hashes&quot;: [ &quot;sha256:f9030df8bc31aad45baffa9a2d9ce1fdd8051833e5b5bda3027df32fdec00fad&quot; ], &quot;index&quot;: &quot;pypi&quot;, &quot;version&quot;: &quot;==0.3.1&quot; }, </code></pre> Version 0.3.1 of the module &quot;readability&quot; exists, but does not appear to have a class &quot;Document&quot;.
评论 #39267195 未加载
OhMeadhbh超过 1 年前
Apropos of nothing, I added this function so I don&#x27;t have to leave the command line to see the PDF.<p><pre><code> pdfpage() { convert -resize 0x1000^ &quot;${1}&quot;[${2}] -background white -flatten sixel:- } </code></pre> You can probably deduce it assumes you have a Imagemagick installed and you&#x27;re in a terminal with sixel support.
fishywang超过 1 年前
Somewhat similarly, I wrote a web app to generate epub (instead of pdf) out of urls and send to eink reader(s) directly (via a telegram bot) so I can read them. Currently it supports sending epub by email (for kindle) or uploading epub to dropbox (for kobo, etc.). It originally also supports reMarkable cloud but we can no longer make reMarkable cloud actually work. There&#x27;s also a REST api to generate epub to be downloaded directly: <a href="https:&#x2F;&#x2F;github.com&#x2F;fishy&#x2F;url2epub&#x2F;blob&#x2F;main&#x2F;REST.md">https:&#x2F;&#x2F;github.com&#x2F;fishy&#x2F;url2epub&#x2F;blob&#x2F;main&#x2F;REST.md</a><p>For e-ink readers epubs are generally better than PDFs for urls anyways, as epubs are basically packed htmls, and also the flow text works better on smaller screens.
Throw73747超过 1 年前
Parhaps add ublock filters support? I use it to strip down any unwanted elements on page before printing. On hacker news discussions it removes forms, reply links, header and footers...
rahimnathwani超过 1 年前
For print or PDF, I like multi-column newspaper style, as created by this extension: <a href="https:&#x2F;&#x2F;chromewebstore.google.com&#x2F;detail&#x2F;simple-print&#x2F;nalmbmopkipfhijmcncelapgbkgoligf" rel="nofollow">https:&#x2F;&#x2F;chromewebstore.google.com&#x2F;detail&#x2F;simple-print&#x2F;nalmbm...</a><p>One benefit of using a Chrome extension (vs. CLI) is that it&#x27;s easy to &#x27;print&#x27; things that require authentication.
jll29超过 1 年前
Have you compared it with a conversion by pandoc (<a href="https:&#x2F;&#x2F;pandoc.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;pandoc.org&#x2F;</a>)?
评论 #39272104 未加载
sn0n超过 1 年前
Does it run a headless chrome for pixel perfect formatting as laid out as a webpage and applied in that format to PDF ignoring the pages print css rules? Cuz, that would be a nice start. And an option for size to be pixel width based for ideal layout... Because I won&#x27;t be printing, I will be viewing on my phone, so one overly large page would be perfect.
harry8超过 1 年前
Webbrowser opens url -&gt; print -&gt; save as&#x2F;to pdf?<p>I&#x27;m sure I&#x27;m missing something, what is a cli interface buying me here?
K2h超过 1 年前
Very cool! in README.md is that an extra &#x27;p&#x27; in Webp2pdf ?
codeonline超过 1 年前
Can you add comparison pdfs generated by pandoc and gotenberg?
skanga超过 1 年前
Found some potential bugs. Please check the github issues page.