The biggest problem with this "not the true text" issue is when coders encounter unicode.<p>A lot of coders, those who have worked in primarily english countries see ascii as utf-8 and the difference is invisible. They can go decades being oblivious to topics like encodings and mappings and display.<p>So it can be surprising to them when they start dealing with European characters for the first time. They view the text in one place (like an editor which treats the file as utf-8) and another (their program) which treats the text as ASCII.<p>It's hard to explain to them that "when I look at it" isn't a universal truth, it also matters how the "look at it program" chooses to interpret, and display, it.
<p><pre><code> In the end, what truly matters is whether the codebase is consistent—either using tabs or spaces throughout
</code></pre>
I use tabs for code indentation, but spaces for non-code indentation (eg: for ascii diagrams within comments).<p>Anyone who has converted a lot of code, from different projects, from spaces to tabs will have noticed: the vast majority of code with spaces contains a few screwups where a line or two in a 4-spaced file actually contains 3 spaces.<p>Why that happens, despite editors automatically converting tabs to spaces, is beyond me, but it is a ubiquitous phenomenon. I suspect this is the real reason some people, certainly myself, prefer tabs.
Greppability is an interesting idea, and a good one, but I'm going to disagree with the recommendation<p>> Stop hard-wrapping and just use soft-wrapping,<p>Grep for some pattern in soft-wrapped text and you get a lot of extraneous material.<p>You also can't grep for things "at the beginning of the line", which is often an important indicator. When I did a lot of plain C programming, I would put function names at the start of a line, below their return type to make it easy to grep for a function definition, rather than just uses.<p>Soft-wrapping also limits the use of diffability, a complement to grepability. You might correct a single letter in a misspelled word in a soft-wrapped paragraph. Do a "git diff" or equivalent and you'll get back a huge block of "changed" text. Useless. Short, hard wrapped lines make it easy to see diffs.
> Soft Wrapping vs Hard Wrapping<p>This is actually one of HTML's most underrated features - there is no distinction between hard and soft wrapping. Any whitespace, of any form and quantity, between any two words is just converted to a single space in the rendered output.<p>Thus the developer, in a code editor, is free to hard wrap and indent the text in whatever way makes the most visual sense. Meanwhile in the rendered output the actual wrapping that occurs (if any) is controlled by the stylesheet.<p>I wish more programming languages had multiline string syntax that could do this (automatically remove all newlines and indentation). It turns out to be quite useful in a variety of domains.
The niche for hard-wrapping is straightforward. Sending patches via email.<p>In these types of communities there is no formal markup. So what is code and what is text? You can’t tell. Some might use “code fence”. Some might use four-space indents. Some might just dump code in between prose. And when you comment on a patch you comment directly on the diff.[1]<p>You can’t just let the email reader go to town on the text. That’s fine for prose but annoying for code where every line break is either intentional or machine formatted.<p>The author mentions the downside of browsing on a mobile device. Yeah I sometimes do that. But the primary mode for this kind of browsing might just be on a laptop/desktop. Certainly if you plan on doing some coding. (not just browsing the email archive for discussions that happened eight years ago... not that I would ever do that)<p>[1] Maybe diffs are easy to parse out of a message since each line starts with `+`, `-` or a space. After you have peeled away the quoting.
> I don’t know whether this is just due to first-mover advantage or not but it also looks like more projects use spaces over tabs. So what’s the point of going against the tide where there does not seem to be a very powerful advantage anyway?<p>Sure, <i>now</i>. But, there was a time when I was a young man in college (circa 1997) where professors and the industry would push Tabs as a standard. Shortly after, the tides changed and we were all using spaces.<p>> Stop hard-wrapping and just use soft-wrapping<p>Who cares about grep? I mean, aside from the OP and probably many on here. Wrapping is a task that should be left up to the viewing device/software. It can be made to be <i>responsive</i>, which hard-wrapping cannot be.<p>> newline<p>This really should be a solved issue by now. Both as users and by software.
Those points really depend on context:<p>- if it’s code, you should be using an automatic code formatter and that’s it.
- if you write prose, sure, soft wrap.<p>If you grep, \s doesn’t care about space vs tabs.<p>Sadly, elastic tabs never caught as the default [0].<p>Maybe we would need something like a "semantic alignment" marker instead of using spaces for aligning things. Like beginning of function name, beginning of function argument, etc.<p>[0]: <a href="https://nick-gravgaard.com/elastic-tabstops/" rel="nofollow">https://nick-gravgaard.com/elastic-tabstops/</a>
Biggest thing about plain text to me is that it's not really a thing. Markup languages generally are not considered plain text, I would not consider code either as plain text (it's literally called <i>code</i>). What does that leave? Prose and similar writings maybe. But do you include control codes in plain text? I don't think something like ASCII bell can be thought as plain text. The various whitespace characters are tricky question... they arguably are formatting codes, so if we think plain text as unformatted text, then things like newlines don't really fit the picture; instead we should have some semantic markers like "paragraph separator". On the other hand if we allow plain text to include formatted text, then it opens a whole can of worms on different ways to represent formatting.
In my typical coding style I always start a newline for function arguments so the tabs vs spaces argument has never mattered to me.<p>The other way leads to a bunch of random indentation levels all over the file which has always looked ugly to me.
Another fun one: you can encode accented characters in two different ways: as a character plus its accent, or as a dedicated Unicode character. Software and operating systems can have their own standard for this.<p>I learned it the hard way with my static site generator and the pages of my website that use Umlauts. It introduced subtle problems where Syncthing would replace one standard with another, and nginx would suddenly 404 on URLs that looked fine to me.
I love everything about this post. Thank you. What a great way to start the day. (One nit: it would be cool to have a link to the source code for the post).<p>Here's my user test: <a href="https://news.pub/?try=https://www.youtube.com/embed/rx7nv6R5Rww?si=tuR6lhGHbNW6BYUa" rel="nofollow">https://news.pub/?try=https://www.youtube.com/embed/rx7nv6R5...</a>
> This is also partly the reason why I use spaces most of the time. If you still end up adjusting the tab width to match others’ preferences, what’s the purpose of using tabs in the first place?<p>Or if you used elastic tabstops, the pursose of using tabs would be that this alignment happens automatically on edits instead of you having to adjust the number of spaces manually
Question about hard wrapping: If you have a piece of hard wrapping text how do you easily add text in the middle? Or you just have to re-wrap all the following lines by hand