TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Pup – A command-line HTML parser

126 点作者 ericchiang超过 10 年前

12 条评论

nikital超过 10 年前
While reading the examples, I was surprised by the placement of the output redirection statement:<p><pre><code> $ pup &lt; robots.html title </code></pre> For some reason I thought that it must come last. Turns out that you can place it anywhere in the command! All these are equivalent in bash:<p><pre><code> $ pup title &lt; robots.html $ pup &lt; robots.html title $ &lt; robots.html pup title</code></pre>
评论 #8314151 未加载
aw3c2超过 10 年前
&quot;I bet it&#x27;s node or ruby...&quot; Sees .go file extension. &quot;Oh nice, I never used a Go program before!&quot; But then I am supposed to &#x27;$ go get github.com&#x2F;ericchiang&#x2F;pup&#x27; to install it.<p>Why does everything nowadays have to come with its own package manager? I like the separation between my home directory and the &quot;system packages&quot;. I don&#x27;t want to have to care for and update and separately backup ~&#x2F;go, ~&#x2F;.npm and so on and so forth.<p>This looks super nice, I especially like the detailed list of examples. Sorry for the rant.<p>edit: There are binaries in the &quot;dist&quot; directory, the readme just did not mention them. Thanks!
评论 #8313431 未加载
评论 #8313388 未加载
评论 #8314871 未加载
jkbr超过 10 年前
Happy to see this. Pup will be a nice companion to HTTPie[1] as it also works with standard streams:<p><pre><code> $ http example.org | pup h1 text{} | http httpbin.org&#x2F;post </code></pre> [1] <a href="http://httpie.org/" rel="nofollow">http:&#x2F;&#x2F;httpie.org&#x2F;</a>
ushi超过 10 年前
So getting the front page links is now as easy as:<p><pre><code> curl https:&#x2F;&#x2F;news.ycombinator.com | pup td.title a attr{href} </code></pre> Well done and thx for sharing.
grannyg00se超过 10 年前
Also see w3&#x27;s html-xml-utils. For example hxextract: <a href="http://www.w3.org/Tools/HTML-XML-utils/man1/hxextract.html" rel="nofollow">http:&#x2F;&#x2F;www.w3.org&#x2F;Tools&#x2F;HTML-XML-utils&#x2F;man1&#x2F;hxextract.html</a>
评论 #8314085 未加载
artursapek超过 10 年前
Really great seeing more and more CLI tools being built in Go. :-)
mbesto超过 10 年前
Wait, what&#x27;s the difference between this and using a Ruby&#x2F;Python&#x2F;etc REPL? In other words, normally to achieve this same result I would do:<p>irb -&gt; require &#x27;Nokogiri&#x27; and require &#x27;open-uri&#x27; -&gt; doc = Nokogiri::HTML(open(&#x27;<a href="http://www.google.com/&#x27;)" rel="nofollow">http:&#x2F;&#x2F;www.google.com&#x2F;&#x27;)</a>)<p>and no need to store the HTML via wget on my machine. Am I missing something?
评论 #8313518 未加载
评论 #8313427 未加载
Gys超过 10 年前
Did you know of goquery (github.com&#x2F;PuerkitoBio&#x2F;goquery) ?
morenoh149超过 10 年前
very nice. Could replace a bunch of awk and sed one off scripts floating around on people&#x27;s harddrives.
评论 #8313394 未加载
illesim超过 10 年前
Is there any way to use pseudo-selectors, like :last-child?
mholt超过 10 年前
cat and pup play well together.
WorldWideWayne超过 10 年前
Looks great! Thank you so much for making a Windows build.