TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Why not parse `ls` and what to do instead

170 pointsby nomilk11 months ago

27 comments

hawski11 months ago
I think that when someone uses ls instead of a glob it means they most probably don&#x27;t understand shell. I don&#x27;t see any advantage of parsing ls output when glob is available. Shell is finicky enough to not invite more trouble. Same with word splitting, one of the reasons to use shell functions, because then you have &quot;$@&quot; which makes sense and any other way to do it is something I can&#x27;t comprehend.<p>Maybe I also don&#x27;t understand shell, but as it was said before: when in doubt switch to a better defined language. Thank heavens for awk.
评论 #40788473 未加载
评论 #40786553 未加载
评论 #40789209 未加载
评论 #40787992 未加载
评论 #40786432 未加载
评论 #40792006 未加载
评论 #40790075 未加载
评论 #40796728 未加载
评论 #40787443 未加载
Aerbil31311 months ago
What to do instead: Use Nushell.<p>I finally started really using my shell after switching to it. I casually write multiple scripts and small functions per day to automate my stuff. I&#x27;m writing scripts I&#x27;d otherwise write in python in nu. All because the data needs no parsing. I&#x27;m not even annotating my data with types even though Nushell supports it because it turns out structured data with inferred types is more than you need day-to-day. I&#x27;m not even talking about all the other nice features other shells simply don&#x27;t have. See this custom command definiton:<p><pre><code> # A greeting command that can greet the caller def greet [ name: string # The name of the person to greet --age (-a): int # The age of the person ] { [$name $age] } </code></pre> Here&#x27;s the auto-generated output when you run `help greet`:<p><pre><code> A greeting command that can greet the caller Usage: &gt; greet &lt;name&gt; {flags} Parameters: &lt;name&gt; The name of the person to greet Flags: -h, --help: Display this help message -a, --age &lt;integer&gt;: The age of the person </code></pre> It&#x27;s one of the software that only empowers you, immediately, without a single downside. Except the time spent learning it, but that was about a week for me. Bash or fish is there if I ever need it to paste some shell commands.
评论 #40786496 未加载
noobermin11 months ago
Posts like these are like the main character threads on twitter where someone says, &quot;men don&#x27;t do x&quot; or &quot;women aren&#x27;t like y.&quot; It just feels like people outside of you who have no understanding of your context seem intent on making up rules for how you should code things.<p>Perhaps it would help to translate this into something more like, &quot;what pitfalls do you run into if you parse `ls`&quot; but it&#x27;s hard to get past the initial language.
评论 #40786401 未加载
probably_wrong11 months ago
I think there&#x27;s a middle point where you want to do something that&#x27;s complex enough that a glob won&#x27;t cut it but simple enough that switching languages is not worth it.<p>I think the example of &quot;exclude these two types of files&quot; is a good case. I often have to write stuff like `ls P* | grep -Ev &quot;wav|draft&quot;` which doesn&#x27;t solve a problem I don&#x27;t have (such as filenames with newlines in them) but does solve the one I do (keeping a subset of files that would be tricky to glob properly).<p>In my experience 95% of those scripts are going to be discarded in a week, and bringing Python into it means I need to deal with `os.path` and `subprocess.run`. My rule of thumb: if it&#x27;s not going to be version controlled then Bash is fine.
评论 #40787014 未加载
评论 #40787098 未加载
评论 #40789227 未加载
评论 #40786842 未加载
fellerts11 months ago
The title omits the final &#x27;?&#x27; which is important, because the rant and its replies didn&#x27;t settle the matter.<p>Shellcheck&#x27;s page on parsing ls links to the article the author is nitpicking on, but it also links to the answer to &quot;what to do instead&quot;: use find(1), unless you really can&#x27;t. <a href="https:&#x2F;&#x2F;mywiki.wooledge.org&#x2F;BashFAQ&#x2F;020" rel="nofollow">https:&#x2F;&#x2F;mywiki.wooledge.org&#x2F;BashFAQ&#x2F;020</a>
badsectoracula11 months ago
I guess this is for shell scripts that need to work with &quot;unsafe&quot; filenames?<p>I&#x27;ve been using Linux since 1999 and i never came across a filename with newlines. On the other hand, pretty much all &quot;ls parsing&quot; i&#x27;ve done was on the command-line to pipe it to other stuff in files i was 100.1% sure would be fine.
评论 #40791637 未加载
geophile11 months ago
I wrote a pipe-objects-instead-of-strings shell: <a href="https:&#x2F;&#x2F;marceltheshell.org" rel="nofollow">https:&#x2F;&#x2F;marceltheshell.org</a>.<p>Not piping strings avoids this issue completely. Marcel’s ls produces a stream of File objects, which can be processed without worrying about whitespace, EOL, etc.<p>In general, this approach avoids parsing the output of any command. You always get a stream of Python values.
评论 #40790800 未加载
jcalvinowens11 months ago
Not sure how portable it is, but gnu ls has a flag to solve this problem trivially:<p><pre><code> --zero end each output line with NUL, not newline</code></pre>
评论 #40798992 未加载
billpg11 months ago
Why do you want to put LF bytes into filenames?<p>Using magic, I&#x27;ve renamed any files you have to remove control characters in the name and made it impossible to make any new ones. (You can thank me later.)<p>What can&#x27;t you do now?
评论 #40791632 未加载
7bit11 months ago
Or use PowerShell where LS returns a bunch of objects, and say goodbye to string parsing forever.
评论 #40787523 未加载
评论 #40787794 未加载
mcc1ane11 months ago
<a href="https:&#x2F;&#x2F;mywiki.wooledge.org&#x2F;BashPitfalls#for_f_in_.24.28ls_.2A.mp3.29" rel="nofollow">https:&#x2F;&#x2F;mywiki.wooledge.org&#x2F;BashPitfalls#for_f_in_.24.28ls_....</a>
waffletower11 months ago
Borkdude has a wonderful Clojure&#x2F;Babashka solution in this space: <a href="https:&#x2F;&#x2F;github.com&#x2F;babashka&#x2F;fs">https:&#x2F;&#x2F;github.com&#x2F;babashka&#x2F;fs</a>
g15jv2dp11 months ago
What to do instead: use pwsh to completely obviate all these issues.
评论 #40786665 未加载
评论 #40786652 未加载
评论 #40786367 未加载
评论 #40790429 未加载
评论 #40786413 未加载
teddyh11 months ago
Many people turn to globbing to save them, which is usually better, but has some problems in case of no matches. But, for Bash, you can do this to fix it:<p><pre><code> shopt -s failglob</code></pre>
cess1111 months ago
I don&#x27;t know, this seems like a lot of words to avoid coming to the conclusion that there are many ways to skin a directory.<p>Most of the time it&#x27;s fine to just suck in ls and split it on \n and iterate away, which I do a lot because it&#x27;s just a nice and simple way forward when names are well-formed. Sometimes it&#x27;s nicer to figure out a &#x27;find at-place thing -exec do-the-stuff {} \;&#x27;. And sometimes one needs some other tool that scours the file system directly and doesn&#x27;t choke on absolutely bizarre file names and gives a representation that doesn&#x27;t explode in the subsequent context, whatever that may be, which is quite rare.<p>A more common issue than file names consisting of line breaks is unclean encodings, non-UTF-8 text that seeps in from lesser operating systems. Renaming makes the problem go away, so one should absolutely do that and then crude techniques are likely very viable again.
tmtvl11 months ago
Today I learned how neat find is:<p><pre><code> find ~&#x2F;Music -iname &#x27;p*&#x27; -not -iname &#x27;*age*&#x27; -not -iname &#x27;*etto*&#x27; find ~&#x2F;Music -iname &#x27;p*&#x27; -not -iregex &#x27;.*\(age\|etto\).*&#x27; find ~&#x2F;Music -regextype posix-extended -iname &#x27;p*&#x27; -not -iregex &#x27;.*(age|etto).*&#x27; </code></pre> Not that I&#x27;m likely to ever use any of that in anger, but it&#x27;s good to know if ever I do wind up needing it.
zokier11 months ago
I wonder if anyone has implemented kernel module or smth to limit filenames to sane set. Just ensuring that they are valid utf8 and do not contain any non-printables would be huge improvement. Sure some niche applications might break so its not something that can be made default, but still I think it would help on systems I control.
Nimitz1411 months ago
These sorts of pedantic exchanges are so pointless to me. We are programmers. We can control what characters are used in filenames. Then you can use the simplest tool for the job and move on with your life to focus on the stuff that actually matters. Fix the root cause instead of creating workarounds for the symptom.
amelius11 months ago
I feel like Unix utilities should provide a standardized way to generate machine-readable output, perhaps using JSON.
评论 #40786388 未加载
评论 #40786720 未加载
评论 #40791517 未加载
midjji11 months ago
The bash code which creates the c file which gets the list of null terminated files in a directory and compiles it, and runs it, is easier to write and understand. Bash is a lousy language to do anything in, python is almost always available, and if not, then CC is.
Tempest198111 months ago
Recent discussion about the original &quot;don&#x27;t parse&quot; page being referenced:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40692698">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=40692698</a> (10 days ago, 83 comments)
InsideOutSanta11 months ago
Files and directories, once a reference to them is obtained, should not be identified by their path. This causes all kinds of problems, like the reference breaking when the user moves or renames things, and issues like the ones described in the article, where some &quot;edge case&quot; (and I&#x27;m using that term very loosely, because it includes common situations like a space in a file name) causes problems down the line.<p>You might say that people don&#x27;t move or rename things while files are open, but they absolutely do, and it absolutely breaks things. Even something as simple as starting to copy a directory in Explorer to a different drive, and then moving it while the copy is ongoing, doesn&#x27;t work. That&#x27;s pathetic! There is no technical reason this should not be possible.<p>And who can forget the case where an Apple installer deleted people&#x27;s hard disk contents when they had two drives, one with a space character, and another one whose name was the string before the first drive&#x27;s space character?<p>Files and directories need to have a unique ID, and references to files need to be that ID, not their path, in almost all cases. MFS got that right in 1984, it&#x27;s insane that we have failed to properly replicate this simple concept ever since, and actually gone backwards in systems like Mac OS X, which used to work correctly, and now no longer consistently do.
评论 #40790709 未加载
lostmsu11 months ago
This is a problem I faced recently on Linux. You can use ip addr to see the list of your IPv6 addresses and their types (temporary or not, etc). But doing it programmatically from a non-C codebase is way more involved.
tremon11 months ago
Most of the time I avoid parsing ls, but I haven&#x27;t found a reliable way to do this one:<p><pre><code> latest=&quot;$(ls -1 $pattern | sort --reverse --version-sort | head -1)&quot; </code></pre> Anyone got a better solution?
评论 #40793572 未加载
评论 #40793161 未加载
renewiltord11 months ago
I just solve this by not having files like that on my computer. No spaces. No null chars.
评论 #40794935 未加载
bandie9111 months ago
i searched through the page and have not found `find ... -printf &quot;%M %n %u %g %s ...\0&quot;` mentioned. this way you get ls(1)-like output, yet machine-parseable.
TacticalCoder11 months ago
Now of course having scripts and pre-commit hooks enforcing simple rules so that files <i>must</i> only use a subset of Unicode are a thing and do help.<p>Do you really think that, say, all music streaming services are storing their songs with names allowing Unicode HANGUL fillers and control characters allowing to modify the direction of characters?<p>Or... Maybe just maybe that Unicode characters belong to metadata and that a strict rule of &quot;only visible ASCII chars are allowed and nothing else or you&#x27;re fired&quot; does make sense.<p>I&#x27;m not saying you always have control on every single filename you&#x27;ll ever encounter. But when you&#x27;ve got power over that and can enforce saner rules, sometimes it&#x27;s a good idea to use it.<p>You&#x27;ll thank me later.