TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

When a Space Is Not a Space

19 点作者 erickhill将近 4 年前

10 条评论

dspillett将近 4 年前
It isn&#x27;t just the usual non-breaking space. I&#x27;ve seen other space types like &quot;En Space&quot; creep into content copied into HTML entry forms, and cause issues further down the line (usually a ? appearing when an app tries to convert to a non-unicode encoding and doesn&#x27;t understand these characters).<p>There are a few spaces in the Unicode standard: <a href="https:&#x2F;&#x2F;www.compart.com&#x2F;en&#x2F;unicode&#x2F;category&#x2F;Zs" rel="nofollow">https:&#x2F;&#x2F;www.compart.com&#x2F;en&#x2F;unicode&#x2F;category&#x2F;Zs</a><p>And the list linked above doesn&#x27;t include things like zero-width-joiner. See <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Whitespace_character" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Whitespace_character</a> for a fuller list.<p>If you want to be evil to a programmer, as well as being tricksy with white-space, replace some semi colon characters (U+003B - ;) with Greek question marks (U+037E - ;) and see ho wlong it takes them to work out why their compiler or linter isn&#x27;t happy…<p>The takeaway from all this is that you should never assume plain text is simple.
评论 #27539336 未加载
smitty1e将近 4 年前
And then there are those curly quotes. <i>shakes fist</i>
nneonneo将近 4 年前
In fact, every entity sequence (&amp;...;) in HTML encodes a character from Unicode (one or more codepoints). So, for example, &amp;nbsp; refers to the Unicode character U+00A0 NO-BREAK SPACE. (Note that the bytes C2 A0 are simply the UTF-8 encoding of this character).
johnzim将近 4 年前
This is DEFINITELY a feature.<p>I bet users are copying and pasting values from the WordPress editor into a word processor and HATE seeing &amp;nbsp; so they probably asked for this solution.
wruza将近 4 年前
<i>html-based tools and&#x2F;or source views failed</i><p>These do not have any control on how your text is copied, because your browser does that (it was the reason for an entire issue in the first place). Maybe they could convert A0 to &amp;nbsp; but then you couldn’t copy that text into non-html environments easily. It is inevitable for &amp;lt&#x2F;gt&#x2F;amp;, but should they also convert e.g. dashes and other html-safe entities?
评论 #27538214 未加载
lazulicurio将近 4 年前
If I had to spitball, I&#x27;d guess that the issue was caused by the interplay between HTML&#x27;s whitespace collapse algorithm and whatever method WP uses for source editing. That something is converting whitespace characters to U+0020 for collapse and display, but the conversion breaks copy-and-paste. Whitespace issues in HTML can get quite annoying at times.
homami将近 4 年前
Non-breaking space is very important for languages like Arabic and Farsi. Changing a non-breaking space to a normal space in some sequence of characters, may change the meaning of the word in these language.
tkambler将近 4 年前
“I’m not sure what happened, but I blame WordPress.”
评论 #27537661 未加载
jeffwass将近 4 年前
&quot;This is like the 21st century version of confusing a zero with a capital letter O, yet worse.&quot;<p>I like the analogy, but why is it worse?
评论 #27537307 未加载
hagen1778将近 4 年前
Nice article!