What is the unit of a text column number?

59 pointsby amar-lakshover 3 years ago

8 comments

gumbyover 3 years ago

> How many spaces is a tab? GCC seems to say “8”, for some reasonOn old mechanical typewriters without adjustable tab stops (which were on the carriage, not the part you typed BTW, which skeuomorphically survived in Word’s tab stop interface), the tab key (those that had tab keys) slid the carriage left 8 spaces. You could often push it back a bit if you wanted. This carried over to TTYs which were grossly electromechanical devices and from there to glass teletypes and terminals.So it’s the proper default.Typewriters were pretty direct. Often they omitted the 1 and 0 digits (just use l and O). Of course there was not a VT or LF — you just stretched out your right hand and turned the platen. For FF you gripped the paper and pulled. On some models, to backspace you just pushed the carriage. And to delete (called rubout on a some old terminals) you painted or XXXed out the offending text — or used a pen. Even on legal documents (which is why they still both write out use digits to specify numbers)

评论 #28939340 未加载

评论 #28939312 未加载

评论 #28939360 未加载

评论 #28942017 未加载

评论 #28940632 未加载

derefrover 3 years ago

> Dare I say and ask: How many spaces is a tab?I just checked (because I had a hazy memory of this doing something weird): tab characters printed raw to a PTY and looked at in a terminal emulator/tmux/etc., aren't any fixed number of spaces wide.Rather, a tab, at least when rendered onto a PTY, advances the cursor to the next tabstop — i.e. to the next virtual column that is a multiple of 8.Which has the dreadful implication that, to handle column prediction when there could ever be tabs in the input, you actually need to fully model the rendering behavior of a PTY.And don't get me started on predicting the width of text printed raw containing ANSI escape codes!The fact that libncurses works at all, while dealing with all of this, is a never-ending source of amazement to me.

评论 #28938959 未加载

评论 #28938500 未加载

评论 #28939251 未加载

moonchildover 3 years ago

> grapheme clusters [...] portableIf only. Boundary locations are dependent on unicode version. If your terminal uses one and your compiler another—boom.<pre><code> * * * </code></pre> In my c compiler I distinguish 'acol' and 'vcol'. The former stands for 'actual column', and the latter for 'visual column'. The actual column is a byte offset, which can be used to identify the offending location in source, while the visual column represents a physical offset in glyph-widths.The issue given in TLA of tab widths being different can be resolved by making the compiler expand the tabs itself. This does nothing for e.g. emoji, if there is disagreement about the version of unicode in use.Vim seems to do something similar. Given 'set ruler', if I type a tab followed by an em dash, I am told that I am in column '5-10'. 5 is 3 bytes from the em dash (in utf8) and 1 byte from the tab, plus 1. 10 is 8 glyph-widths from the tab and 1 glyph-width from the em dash, plus 1.<pre><code> * * * </code></pre> However, my approach to error-handling should perhaps not be taken as representative. E.G.> But nobody looks at an error message and manually navigates to the location using the column information!I do. And I also dislike rustc's error messages, which apparently receive universal acclaim.

评论 #28939000 未加载

评论 #28942279 未加载

CodeIsTheEndover 3 years ago

Two things:1) I recently implemented truncating long lines for CLI tool and went with a hybrid approach using both graphemes and virtual columns -- I'd only truncate a line between graphemes, but when counting how much space was used up I would use virtual columns. In the case of something like , this means things tend to error on the "safe" side of truncating a line too early if the virtual column approach counts it as 4 columns wide rather than 2.2) I wanted to test something with the scientist emoji and managed to crash Ruby's repl, irb, simply by pasting it into the repl and then backspacing over it. (It was clearly confused about the position of the cursor, and the stacktrace pointed to an error in a line_editor.rb file.) I was on Ruby 2.7.1, but it looks like it's been fixed in 3.0.0!

lifthrasiirover 3 years ago

> Most Chinese, Japanese or Korean characters are rendered twice as wide as most other characters, even in a monospace font.Not just CJK characters, but also a lot of non-Latin characters and symbols (a canonical example being ↑). In the East Asian Width standard [1] they are classified as "ambiguous", which can be half-width or full-width depending on the user choice. (By the way thank you much for pointing this out, this is super non-obvious to non-CJK developers and consequently affects CJK developers!)[1] <a href="https://unicode.org/reports/tr11/" rel="nofollow">https://unicode.org/reports/tr11/</a>

评论 #28940832 未加载

ridiculous_fishover 3 years ago

If you're measuring text for terminal display, you might like my "widecharwidth" library. It tries to be what wcwidth should have been.<a href="http://github.com/ridiculousfish/widecharwidth" rel="nofollow">http://github.com/ridiculousfish/widecharwidth</a>

avmichover 3 years ago

> Emojis easily combine many many code points into one symbol. It begins with flag emojis such as , which is actually a special “E” followed by “U”, continues with emojis such as (scientist), which is (person) glued together with (microscope) using a special joiner code point, and ends at the absolute pinnacle of code point combinations - the family emoji . How do you make a family? Easy, you take a person (with optional skin tone and gender modifier) and glue it with another person, as well as their children. That way you can easily end up with a single “character” consisting of ten or more code points!Why do we create unnecessary complexities and then refuse to dismantle them when we start having problems with them? We're so unable to admit mistakes?

评论 #28941611 未加载

wolverine876over 3 years ago

The title maybe should be enhanced to: "What is the Unicode unit of a text column number?"

评论 #28938988 未加载