TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Confusing Grep Mistakes I've Made

92 点作者 r4um超过 4 年前

20 条评论

throwaway373438超过 4 年前
Others have commented that many of these are general shell or terminal quoting problems.<p>Something that stood out for me is that the author did not mention ^V, which is very useful in quoting metacharacters. Take the tab example: The author seems to imply that PCRE is needed to match a tab because there is no \t escape sequence in BRE&#x2F;ERE. Presumably he cannot just type in a tab because he&#x27;s using a shell like bash, and tab has a special interpretation and cannot be typed in as a string literal.<p>The way around this is to use ^V as a <i>terminal</i> escape sequence, followed by simply pressing the tab key. This technique can be used to insert other control characters as string literals in arguments. Want to grep for EOF? &quot;grep ^V^D&quot; will get you there.
评论 #25004179 未加载
评论 #25004888 未加载
评论 #25003396 未加载
justinsaccount超过 4 年前
My favorite grep mistake is actually from a related tool pgrep and pkill.<p><pre><code> pgrep foo -&gt; finds things running matching foo pkill foo -&gt; kills things running matching foo </code></pre> except every year or so, I do something like this<p><pre><code> $ pkill foo $ echo nothing happened? $ pkill -9 foo $ echo nothing happened still? huh? $ echo ok, let&#x27;s run this in verbose mode.. $ pkill -9 foo -v </code></pre> but.. -v isn&#x27;t verbose. since pkill is part of pgrep, and pgrep is like grep, -v is &#x27;Reverse&#x27;.
评论 #25005278 未加载
评论 #25004534 未加载
colanderman超过 4 年前
On many systems,<p><pre><code> grep &#x27;[A-Z]&#x27; </code></pre> will match &#x27;y&#x27; but not &#x27;z&#x27; (note the case). This is due to collation of the system&#x27;s locale, which intersperses upper- and lowercase letters.<p>Usually what you want instead is<p><pre><code> LC_ALL=C grep &#x27;[A-Z]&#x27; </code></pre> (to match ASCII uppercase letters), or<p><pre><code> grep &#x27;[[:upper:]]&#x27; </code></pre> (to match your locale&#x27;s uppercase letters).<p>FWIW I cannot reproduce this on my system any longer; it seems to vary by distribution. See e.g. [1].<p>[1] <a href="https:&#x2F;&#x2F;unix.stackexchange.com&#x2F;questions&#x2F;15980&#x2F;does-should-lc-collate-affect-character-ranges#comment171877_16066" rel="nofollow">https:&#x2F;&#x2F;unix.stackexchange.com&#x2F;questions&#x2F;15980&#x2F;does-should-l...</a>
dehrmann超过 4 年前
&gt; 3) Confusing &#x27;.&#x27; with &#x27;\.&#x27;<p>Because of how different languages handle escaping within strings and not wanting to have to think about it, I&#x27;ve started using [.] to get a literal dot because it always means what I want. I still don&#x27;t like it.
schoen超过 4 年前
Another I&#x27;ve run into:<p>If you meant to search in ∗, but somehow completely forgot to type the ∗ at the end of the command line, you might do something like<p><pre><code> grep foo </code></pre> and then wait for a while while grep searches your standard input, instead of files on disk, until you notice your mistake.<p>(I don&#x27;t find this conceptually confusing -- I expect many Unix tools, including grep, to act on their standard input -- but I&#x27;ve still sometimes simply forgotten the * and not noticed right away.)
评论 #25006127 未加载
评论 #25005558 未加载
评论 #25007927 未加载
disown超过 4 年前
If you are interested in where the name &quot;grep&quot; came from:<p>g&#x2F;re&#x2F;p<p>g: global<p>re: regular expression<p>p: print.<p><a href="https:&#x2F;&#x2F;tldp.org&#x2F;LDP&#x2F;abs&#x2F;html&#x2F;textproc.html" rel="nofollow">https:&#x2F;&#x2F;tldp.org&#x2F;LDP&#x2F;abs&#x2F;html&#x2F;textproc.html</a>
bowmessage超过 4 年前
Often my biggest mistake is using grep instead of rg.
MaxBarraclough超过 4 年前
Tripping over the escaping rules is a continuing pain. GNU sed does things differently, using <i>\|</i> for the regex <i>or</i> operator. [0] It&#x27;s fiddly enough to baffle the occasional StackOverflow answerer. [1]<p>[0] <a href="https:&#x2F;&#x2F;www.gnu.org&#x2F;software&#x2F;sed&#x2F;manual&#x2F;sed.html#BRE-syntax" rel="nofollow">https:&#x2F;&#x2F;www.gnu.org&#x2F;software&#x2F;sed&#x2F;manual&#x2F;sed.html#BRE-syntax</a><p>[1] <a href="https:&#x2F;&#x2F;stackoverflow.com&#x2F;a&#x2F;6388042&#x2F;" rel="nofollow">https:&#x2F;&#x2F;stackoverflow.com&#x2F;a&#x2F;6388042&#x2F;</a>
sillysaurusx超过 4 年前
Just use egrep instead of grep. Much easier to remember &#x2F; less surprising behavior, and it’s supported on every system out of the box (unlike rg).
rattray超过 4 年前
Related, the ack website has a comparison chart of various grep competitors:<p><a href="https:&#x2F;&#x2F;beyondgrep.com&#x2F;feature-comparison&#x2F;" rel="nofollow">https:&#x2F;&#x2F;beyondgrep.com&#x2F;feature-comparison&#x2F;</a>
gnagatomo超过 4 年前
Most of the mistakes are not exclusive&#x2F;related to grep, they are actually shell mistakes.
评论 #25003601 未加载
评论 #25005074 未加载
评论 #25002902 未加载
sethammons超过 4 年前
I like `grep -F` - it treats the search as a literal; no more escaping regex when you are really wanting a &quot;.&quot;.
asicsp超过 4 年前
I have a list of gotchas and tricks here: <a href="https:&#x2F;&#x2F;learnbyexample.github.io&#x2F;learn_gnugrep_ripgrep&#x2F;gotchas-and-tricks.html" rel="nofollow">https:&#x2F;&#x2F;learnbyexample.github.io&#x2F;learn_gnugrep_ripgrep&#x2F;gotch...</a><p>As pointed out in other comments, many of the issues in the post is due to shell, not specific to grep. Especially quoting. Always use single quotes to specify the search pattern, unless other forms of shell quoting is needed. Otherwise, you&#x27;ll face issues with commands like<p><pre><code> grep ; ip.txt </code></pre> Another example is searching for a pattern that starts with a hyphen, which causes issue even with quoting<p><pre><code> $ echo &#x27;5*3-2=13&#x27; | grep &#x27;-2&#x27; Usage: grep [OPTION]... PATTERN [FILE]... Try &#x27;grep --help&#x27; for more information. </code></pre> You&#x27;ll need to either escape the hyphen or use -- before the search pattern to prevent it from being treated as a command option. This is needed if a filename starts with a hyphen too.
评论 #25006533 未加载
jrockway超过 4 年前
Quoting and parsing continue to surprise people, and I don&#x27;t blame them -- you&#x27;re embedding one programming language inside another (regex inside bash), and they use some of the same reserved symbols and have slightly different quoting rules. And, every language is &quot;inspired&quot; by the others, but have their own special rules, so the more you learn, the less sure of anything you&#x27;ll ever be. (For example, &#x27;(&#x27; matches a literal parenthesis in Emacs Lisp regexes, and &#x27;\(&#x27; starts a capture group!)<p>For matching literal periods, I personally have gotten into the habit of using &quot;[.]&quot; instead of &quot;\.&quot;. Less to go wrong in this double-embed scenario, and I have never ever regretted adding the extra byte to my regexp. (Of course, character classes have their own weirdness. Your editor that matches bracket paris will love the syntax for matching a literal &#x27;[&#x27;.)
评论 #25005018 未加载
zmmmmm超过 4 年前
Another basic problem I run into is that grep returns whether it found a match or not as its exit status. This means if you, for example, run a script with bash option<p><pre><code> set -e </code></pre> Then the session will exit unceremoniously on any grep that doesn&#x27;t match. This often catches you out when you say, develop a script without -e and use it for a while, and then one day someone deploys the same thing with -e enabled because they think it will be more robust if the script terminates if a command fails - and boom, now suddenly your script is randomly broken depending on text matches of the files it is processing. It is even worse if you are sourcing the script somehow from within an existing session and it terminates your interactive shell!
评论 #25007353 未加载
raffraffraff超过 4 年前
Most of these things boil down to a few very simple rules (because I rarely run into these problems and I certainly don&#x27;t have more than a few simple rules).<p>1. If your pattern contains variable expansion wrap it in double quotes (but watch out for shell variable expansion characters like &#x27;$`&#x27;)<p>2. Else wrap your search string in single quotes.<p>3. If you&#x27;re gonna do any sort of regex at all just use the &#x27;-E&#x27; flag so it behaves like a proper regex, and learn how to do basic regex.<p>4. Know your shell. Some of these gotchas come from the way the <i>shell</i> interprets the command. Again, most shell gotchas boil down to a few basic rules too (eg the single or double quote thing). For example, in bash I always surround variables in curly braces: ${THIS}. It avoids accidental bash variable expansion, or confusion about the precise name of the variable when it is concatenated with other strings in the pattern.<p>One trick I like is how to use grep as a highlighter.<p>grep --color -E &#x27;(^|My text)&#x27;<p>This matches every line because of the ^ but has nothing to color except your string (since the start of line character is not visible)<p>Also, the -A and -B flags are useful for grabbing lines after&#x2F;before the pattern. And while -C doesn&#x27;t make as much sense as those, its meaning logically follows A and B: grab the lines before and after.<p>Lastly, if you want to search a gigantic directory structure for files containing an expression, but do not want to hit every single file (eg: restrict it to &#x27;.c&#x27; files), you can use this:<p>find &#x2F;path&#x2F;to&#x2F;dir -name &quot;<i>.c&quot; -exec grep -H &quot;pattern {} \;<p>The -H forces grep to show the file name. This gets activated by default when you grep multiple files, but find command executes a separate grep on each individual file, so you lose the filename. -H explicitly adds it.<p>Once you the -H parameter, it&#x27;s easy to remember that -h turns filename </i>off*. For example, to make a playlist containing all Bob Marley songs that are used in other .m3u playlists:<p>grep -r -h &quot;Bob Marley&quot; &#x2F;home&#x2F;user&#x2F;My Music&#x2F;playlists&quot; &gt; marley.m3u<p>This simple approach avoids having to pipe the output into another command like sed or awk to strip away the path, which may not be as simple as it sounds because the file paths may contain spaces and all sorts of other junk.
ineedasername超过 4 年前
Well, not so confusing, but accidentally printing negative matches. Once it locked up my shell instance, and I had to ssh in again to kill it.
rattray超过 4 年前
How many of these gotchas exist with ripgrep?
评论 #25004532 未加载
评论 #25004798 未加载
评论 #25004930 未加载
maest超过 4 年前
What&#x27;s a reasonable grep alias I should add to my .bashrc?
评论 #25004635 未加载
评论 #25005036 未加载
评论 #25003871 未加载
评论 #25006544 未加载
m000超过 4 年前
Better title: grep newbie mistakes
评论 #25002891 未加载