Wow. Considering PHP is used primarily for generating HTML, UTF-8 is the de-facto standard encoding for unicode on the web, and PHP files are usually ASCII or UTF-8, going for UTF-16 seems like a phenomenally bad idea. I can see how UTF-16 looked promising back when it was still UCS-2 and there were no surrogate pairs. These days though, UTF-8 and maybe UTF-32 seem to be the realistic choices when working from scratch; UTF-32's advantage in some areas is probably too weak to make it a real contender unless your strings are literally linked lists, not codepoint arrays. (i.e. you don't care that it uses 2-4 times as much memory or storage)
I think the author has missed the larger problem. The PHP development community is completely dysfunctional. I don't think that a project of the magnitude of PHP 6 is possible without fixing that fundamental problem.<p>Why is it dysfunctional?<p>- every discussion leads to bikeshedding (and almost none of the bikeshedders actually commit code to the Zend engine)<p>- there are 'rules', but they don't apply to most people (ie the 5.4 thing in the article)<p>- no firm hand to guide them (Rasmus has deliberately not provided this)<p>- the mailing list has a complete lack of civility<p>- highest concentration of poisonous people to non-poisonous that I have ever seen<p>- votes for everything<p>- patches are not discussed, either pre or post commit, so the code is bad, and people won't work on it.<p>I was so glad to be the hell out of there.
One of the number one bugs in web apps is assuming that characters can just "flow through" your application, as the article claims is a common case. Sure, if everything is UTF-8, it might work. But the fun comes when some of your data is us-ascii, some is iso8859-1, and some is utf-8. Now treating your data like binary is going to result in a garbled web page. So don't do it; decode data from octets to characters when it comes in your program, manipulate internally as character strings, and encode characters to octets when you output your data. Text is not binary!<p>And if I were Zed Shaw, this is the part where I'd threaten to kill you if you don't meet my demands.
<i>PHP was slated to gain a goto keyword</i><p>How on earth did they decide this was a good idea? The points about getting rid of register_globals and safe_mode are great, but why add a <i>feature</i> to a programming language that is highly likely just to result in lots of awful code?