TechEcho

11 comments

> On August 3rd, the WebAssembly CG will poll on whether JavaScript string semantics/encoding are out of scope of the Interface Types proposal. This decision will likely be backed by Google (C++), Mozilla (Rust) and the Bytecode Alliance (WASI), who appear to have a common interest to exclusively promote C++, Rust respectively non-Web semantics and concepts in WebAssembly.> If the poll passes, which is likely, AssemblyScript will be severely impacted as the tools it has developed must be deprecated due to unresolvable correctness and security problems the decision imposes upon languages utilizing JavaScript-like 16-bit string semantics and its users.So, the problem is that AssemblyScript wants to keep using UTF-16? I'm not sure I understand.Is AssemblyScript the thing that lets you hand-write WebAsm?

评论 #27946774 未加载

评论 #27946656 未加载

syrusakbaryalmost 4 years ago

I'm not going to enter the discussion regarding UTF-8 vs WTF-16 for representing strings, as I lack the context to determine which one is the right approach if everything has to fit the same model. However, I think an approach that allows multiple serialization/deserialization mechanisms depending on the host/guest language seems like a nice way to move it forward.If you want to chime in and retrieve more context, here are some relevant issues:* <a href="https://github.com/WebAssembly/interface-types/issues/135" rel="nofollow">https://github.com/WebAssembly/interface-types/issues/135</a>* <a href="https://github.com/WebAssembly/interface-types/issues/136" rel="nofollow">https://github.com/WebAssembly/interface-types/issues/136</a>* <a href="https://github.com/WebAssembly/design/issues/1419" rel="nofollow">https://github.com/WebAssembly/design/issues/1419</a>

dupedalmost 4 years ago

Can the authors expound on the reasons why they can't compile their language's string semantics into whatever representation will be used by WASI? Both C++ and Rust support numerous string representations, C++ even more so than Rust.

评论 #27952986 未加载

conrad-wattalmost 4 years ago

Full disclosure, I am an active participant in WebAssembly standardisation, my github is here (<a href="https://github.com/conrad-watt" rel="nofollow">https://github.com/conrad-watt</a>). What follows is purely my personal opinion.This announcement is deliberately phrased to scare people who do not have sufficient context. I don't know why some AssemblyScript maintainers have decided to act in this extreme way over what is quite a niche issue. The vote that this announcement is sounding the alarm over is _not_ a vote on whether UTF-16 should be supported.There has been a longstanding debate as part of the Wasm interface types proposal regarding whether UTF-8 should be privileged as a canonical string representation. Recently, we have moved in the direction of supporting both UTF-8 and UTF-16, although a vote to confirm this is still pending (but I personally believe would pass uncontroversially).However, JavaScript strings are not always well-formed UTF-16 - in particular some validation is deferred for performance reasons, meaning that strings can contain invalid code points called isolated surrogates. Again, the referenced vote is _not_ a vote on whether UTF-16 should be supported, but is in fact a vote on whether we should require that invalid code points should be sanitised when strings are copied across component boundaries. Some AS maintainers have developed a strong opinion that such sanitisation would somehow be a webcompat/security hazard and have campaigned stridently against it. However sanitising strings in this way is actually a recommended security practice (<a href="https://websec.github.io/unicode-security-guide/character-transformations/" rel="nofollow">https://websec.github.io/unicode-security-guide/character-tr...</a>), so they haven't gained the traction they were hoping for with their objections.The announcement is worded to obscure this point - talking about "JavaScript-like 16-bit string semantics" (i.e. where isolated surrogates are not sanitised) as opposed to merely "UTF-16", which forbids isolated surrogates by definition, but inviting the conflation of the two.AS does not need to radically alter its string representation - if we were were to support UTF-16 with sanitisation, they could simply document that their potentially invalid UTF-16 strings will be sanitised when passed between components. Note that the component model is actually still being specified, so this design choice doesn't even affect any currently existing AS code. I interpret the announcement's threat of radical change as some maintainers holding AS hostage over the (again, very niche) string sanitisation issue, which is frankly pretty poor behaviour.

评论 #27954400 未加载

评论 #27956459 未加载

qalmakkaalmost 4 years ago

This is an unfortunate consequence of the poor choice of keeping UCS-2 alive as UTF-16 for way too long. The plug in 16 bit encodings should have been pulled a long time ago, but some people were and still are so focused on backwards compatibility that they didn't see they were just pushing the issue to another decade. UTF-8 has won, completely. UTF-16 is basically a zombie nobody wants anymore, kept artificially alive by the fear of big 90s frameworks of clean breaks with the past.We must get rid of legacy encodings no matter the cost, I'm tired of seeing Java and Qt apps wasting millions of CPU cycles mindlessly converting stuff back and forth from UTF-16. It's plain madness, and sometimes you just need the courage to destroy everything and start again.

评论 #27948159 未加载

评论 #27948565 未加载

AndrewDuckeralmost 4 years ago

This seems to be the discussion thread related to this.<a href="https://github.com/WebAssembly/interface-types/issues/13" rel="nofollow">https://github.com/WebAssembly/interface-types/issues/13</a>

评论 #27948837 未加载

felipellrochaalmost 4 years ago

Can someone explain the issue at hand? I'm not sure I have enough context to understand the problem

评论 #27953060 未加载

TeaVMFanalmost 4 years ago

This seems to also impact Java and TeaVM, see this post:<a href="https://groups.google.com/g/teavm/c/gpy0JoKYqbU" rel="nofollow">https://groups.google.com/g/teavm/c/gpy0JoKYqbU</a>

评论 #27953101 未加载

amlutoalmost 4 years ago

Is there a link to the actual poll or its content?

评论 #27953087 未加载

jalino23almost 4 years ago

this sucks. poor web assemblyscript but i really like the rust way

评论 #27953169 未加载

评论 #27951599 未加载

xvilkaalmost 4 years ago

UTF-16 was always a mistake[1]. Good riddance. Time to get it out of LSP specification[2] as well.[1] <a href="http://utf8everywhere.org/" rel="nofollow">http://utf8everywhere.org/</a>[2] <a href="https://github.com/microsoft/language-server-protocol/issues/376" rel="nofollow">https://github.com/microsoft/language-server-protocol/issues...</a>

评论 #27947544 未加载

评论 #27949264 未加载

11 comments

ReactiveJellyalmost 4 years ago

评论 #27946774 未加载

评论 #27946656 未加载

syrusakbaryalmost 4 years ago

dupedalmost 4 years ago

评论 #27952986 未加载

conrad-wattalmost 4 years ago

评论 #27954400 未加载

评论 #27956459 未加载

qalmakkaalmost 4 years ago

评论 #27948159 未加载

评论 #27948565 未加载

AndrewDuckeralmost 4 years ago

评论 #27948837 未加载

felipellrochaalmost 4 years ago

Can someone explain the issue at hand? I'm not sure I have enough context to understand the problem

评论 #27953060 未加载

TeaVMFanalmost 4 years ago

This seems to also impact Java and TeaVM, see this post:<a href="https://groups.google.com/g/teavm/c/gpy0JoKYqbU" rel="nofollow">https://groups.google.com/g/teavm/c/gpy0JoKYqbU</a>

An Urgent Notice from AssemblyScript

11 comments

An Urgent Notice from AssemblyScript

11 comments