I've spent quite a lot of time working with the Web Audio API, and I strongly agree with the author.<p>I got pretty deep into building a modular synthesis environment using it (<a href="https://github.com/rsimmons/plinth" rel="nofollow">https://github.com/rsimmons/plinth</a>) before deciding that working within the constraints of the built-in nodes was ultimately futile.<p>Even building a well-behaved envelope generator (e.g. that handles retriggering correctly) is extremely tricky with what the API provides. How could such a basic use case have been overlooked? I made a library (<a href="https://github.com/rsimmons/fastidious-envelope-generator" rel="nofollow">https://github.com/rsimmons/fastidious-envelope-generator</a>) to solve that problem, but it's silly to have to work around the API for basic use cases.<p>Ultimately we have to hold out for the AudioWorklet API (which itself seems potentially over-complicated) to finally get the ability to do "raw" output.
My first lesson in this was the Roland MPU-401 MIDI interface. It had a "smart mode" which accepted timestamped buffers. It was great... if you wanted a sequencer with exactly the features it supported, like say only 8 tracks. It was well-intentioned, because PCs of that era were slow.<p>The MPU-401 also had a "dumb" a.k.a. "UART" mode. You had to do everything yourself... and therefore could do anything. It turned out that early PCs were fast enough -- especially because you could install raw interrupt service routines and DOS didn't get in the way. :)<p>As a sequencer/DAW creator, you really want the system to give you raw hardware buffers and zero latency -- or as close to that as it can -- and let you build what you need on top.<p>If a system is far from that, it's understandable and well-meaning to try to compensate with some pre-baked engine/framework. It might even meet some folks' needs. But....
Here's my take on the history here:
<a href="http://robert.ocallahan.org/2017/09/some-opinions-on-history-of-web-audio.html" rel="nofollow">http://robert.ocallahan.org/2017/09/some-opinions-on-history...</a>
From the beginning it was obvious that JS sample processing was important, and I tried hard in the WG to make the Web Audio API to focus on that, but I failed.
I have to cast a vote in opposition here.<p>I've been heavily into procedural audio for a year or two, and have had no big issues with using Web Audio. There are solid libraries that abstract it away (Tone.js and Tuna, e.g.), and since I outgrew them working directly with audio nodes and params has been fine too.<p>The big caveat is, when I first started I set myself the rule that I would not use script processor nodes. Obviously it would be nice to do everything manually, but for all the reasons in the article they're not good enough, so I set them aside, and everything's been smooth since.<p>So I feel like the answer to the articles headline is, <i>today as of this moment</i> the Web Audio API is made for anyone who doesn't need script nodes. If you can live within that constraint it'll suit you fine; if not it won't.<p>(Hopefully audio worklets will change this and it'll be for everyone, but I haven't followed them and don't know how they're shaping up.)
Stopped reading at: "Something like the DynamicsCompressorNode is practically a joke: basic features from a real compressor are basically missing, and the behavior that is there is underspecified such that I can’t even trust it to sound correct between browsers. "<p>Then if you look into it:<p><pre><code> dictionary DynamicsCompressorOptions : AudioNodeOptions {
float attack = 0.003;
float knee = 30;
float ratio = 12;
float release = 0.25;
float threshold = -24;
</code></pre>
Which are indeed the basics that you need and totally enough for most use cases.<p>Check out a vintage compressor that has a dozen implementation as VST plugins:<p><a href="http://media.uaudio.com/assetlibrary/t/e/teletronix_la2a_carousel_1_1.jpg" rel="nofollow">http://media.uaudio.com/assetlibrary/t/e/teletronix_la2a_car...</a>
Mozilla had a competing api that just worked with sample buffers. Unfortunately it didn't win the standardization battle.<p><a href="https://wiki.mozilla.org/Audio_Data_API" rel="nofollow">https://wiki.mozilla.org/Audio_Data_API</a>
I tried making a simple Morse code trainer using the Web Audio API, which seemed perfectly suited to the task, but I ran into two major problems:<p>1. Firefox always clicks when starting and stopping each tone. I think that's due to a longstanding Firefox bug and not the Web Audio API. I could <i>mostly</i> elminate the clicks by ramping the gain, but the threshold was different for each computer.<p>2. This was the deal-breaker. Every mobile device I tested had such terrible timing in JavaScript (off by tens of milliseconds) that it was impossible to produce reasonably correct-sounding Morse code faster than about 5-8 WPM.<p>I found these <i>implementation</i> problems more frustrating than the API itself. At this point I'm pretty sure the only way to reliably generate Morse code is to record and play audio samples of each character, which wastes bandwidth and can be done more easily without using the Web Audio API at all.
This article focuses on emscripten examples and for good reason! The effort to resolve the differences between OpenAL and Web Audio has been on-going and exacerbated by Web Audio's API churn, deprecations and poor support.<p>That said, this current pull request on emscripten is a fantastic step forward and I'm very excited to see it's completion: <a href="https://github.com/kripken/emscripten/pull/5367" rel="nofollow">https://github.com/kripken/emscripten/pull/5367</a>
Plus one for sure.<p>I put some weekends into trying to build a higher-level abstraction framework of sorts for my own sound art projects on top of Web Audio, and it was full of headaches for similar reasons to those mentioned.<p>The thing that I put the most work into is mentioned here, the lack of proper native support for tightly (but prospectively dynamically) scripted events, with sample accuracy to prevent glitching.<p>Through digging and prior work I came to a de facto standard solution using two layers of timers, one in WebAudio (which support sample accuracy but gives you hook to e.g. cancel or reschedule events), and one using coarse but flexible JS timers. Fugly, but it worked. But why is this necessary...!?<p>There's a ton of potential here, and someone like myself looking to implement interactive "art" or play spaces is desperate for a robust cross-platform web solution, it'd truly be a game-changer...<p>...so far Web Audio isn't there. :/<p>Other areas I wrestled with:
• buffer management, especially with CORS issues and having to write my own stream support (preloading then freeing buffers in series, to get seamless playback of large resources...)
• lack of direction on memory management, particularly, what the application is obligated to do, to release resources and prevent memory leaks
• the "disposable buffer" model makes perfect sense from an implementation view but could have easily been made a non-issue for clients. This isn't GL; do us some solids yo.<p>Will keep watching, and likely, wrestling...
I had a discussion on Twitter recently about a possible use case for WebAudio - and that was a sound filters - in pretty much the same way as Instagram popularised image filters for popular consumption.<p>One thing that really irks me at the moment is the huge variation in sound volume of the increasing plethora of videos in my social media feed. If there was some way we could use a real time WebAudio manipulation on the browser to equalise the volume on all these home made videos, so much the better. Not just volume up/down, but things like real time audio compression to make vocals stand out a little.<p>Add delay and reverb to talk tracks etc. for podcasts.<p>EQ filters to reduce white noise on outdoor videos etc. also would be better. People with hearing difficulties in particular ranges, or who suffer from tinnitus etc. would be able to reduce certain frequencies via parametric equalisation.<p>It would be intriguing to see a podcast service or SoundCloud etc. offer real time audio manipulation, or let you add post processing mastering effects on your audio productions before releasing them in the wild.
Curiously, reading through Web Audio API bug tracker find items such as <a href="https://github.com/WebAudio/web-audio-api/issues/1305" rel="nofollow">https://github.com/WebAudio/web-audio-api/issues/1305</a> and <a href="https://github.com/WebAudio/web-audio-api/issues/938" rel="nofollow">https://github.com/WebAudio/web-audio-api/issues/938</a>, that echo the point from the article quite well. Oh dear..
I'm running a SaaS built on the back of the Web Audio + WebRTC apis. While it isn't perfect at all, it is still pretty impressive what progress has been made in the last few years allowing you to do all kinds of audio synthesis and processing right in the browser. It seems to me that it is a pretty general purpose api in intent. The approach seems to be to do the easy low hanging fruit first and then get to the more complicated things. This doesn't satisfy any single use case quickly but progress is steady. No doubt it would be nice if it was totally capable out of the gate but I'm simply happy that even the existing capabilities are there. Be patient, it will improve vastly over time.<p>EDIT: I should also add that the teams behind the apis are quite responsive. You can make an impact in the direction of development simply by making your needs/desires known.
I worked on a (now abandoned) project a while back using Web Audio API, but it was NOT for Audio at all - in fact, it was to build a cross platform MIDI controller for a guitar effects controller.<p>As someone mentioned elsewhere on this thread Android suffered from a crappy Audio/MIDI library. iOS's CoreMIDI was great, but not transportable outside of iOS/OSX. Web Audio API's MIDI control seemed a great way to go - just build a cross platform interface using Electron App and use the underlying WebAudio to fire off MIDI messages.<p>Unfortunately, at the time of developing the project, WebAudio's MIDI SYSEX spec was still too fluid or not completely defined, so I had trouble sending/reading SYSEX messages via the API, and thus shelved the project for another day.
> "16 bits is enough for everybody"<p>Not really, the full range of human hearing is over 120db. Getting to 120db within 16 bits requires tricks like noise shaping. Otherwise, simple rounding at 16 bits gives about 80db and horrible sounding artifacts around quiet parts.<p>It's even more complicated in audio production, where 16 bits just doesn't provide enough room for post-production editing.<p>This is why the API is floating-point. Things like noise shaping need to be encapsulated within the API, or handled at the DAC if it's a high-quality one. (Edit) There's nothing wrong with consumer-grade DACs that are limited to about 80-90db of dynamic range; but the API shouldn't force that limitation on the entire world.
Isn't Web Audio based off of MacOS'x Audio API?<p>I think the whole point is that Javascript used to be slow, and using the CPU as a DSP to process samples prevents acceleration. Seems to me what is needed is like "audio shaders" equivalent to compute/pixel shaders, that you farm off to OpenAL-like API which can be compiled to run on native HW.<p>Even if you grant emscripten produces reasonable code, it's still bloated, and less efficient on mobile devices than leveraging OS level DSP capability.
How to play a sine wave:<p><pre><code> const audioContext = new AudioContext();
const osc = audioContext.createOscillator();
osc.frequency.value = 440;
osc.connect(audioContext.destination);
osc.start();
</code></pre>
"BufferSourceNode" is intended to play back samples like a sampler would. The method the author proposes of creating buffers one after the other is a bizarre solution.
Just from skimming the spec, the AudioWorklet interface looks very close to what is needed to build sensible, performant frameworks for audio profs and game designers.<p>So the most important question is: why isn't this interface implemented in any browser yet?<p>That a BufferSourceNode cannot be abused to generate precision oscillators isn't very enlightening.
>> Can the ridiculous overeagerness of Web Audio be reversed? Can we bring back a simple “play audio” API<p>To be frank, graphics world had some type of standard (OpenGL) long time ago, next to DirectX. So WebGL had a good example. However in the audio world we haven't seen a cross platform quasi-standard spec covering Mac, Linux and Windows. So IMHO, non-web audio lacks also common standards for mixing, sound engineering, music-making. That's why web audio appears to lack a use case. IMHO, that smells opportunity.<p>I use Web Audio, in canvas-WebGL based games where music making is needed. I understand the issues - we definitely need more than "play" functionality.
I think things will get a lot better when the underlying enabling technology is in good shape. The audio engine needs to be running in a real-time thread, with all communication with the rest of the world in nonblocking IO. There are lots of ways to do this, but one appealing path is to expose threading and atomics in wasm; then the techniques can be used for lots of things, not just audio. Another possibility is to implement Worker.postMessage() in a nonblocking way. None of this is easy, and will take time.<p>If we <i>had</i> gone with the Audio Data API, it wouldn't have been satisfying, because the web platform's compute engine simply could not meet the requirement of reliably delivering audio samples on schedule. Fortunately, that is in the process of changing.<p>Given these constraints, the complexity of building a signal processing graph (with the signal path happening entirely in native code) is justified, if those signal processing units are actually useful. I don't think we've seen the evidence for that.<p>I'd personally be happy with a much simpler approach based on running wasm in a real-time thread, and removing (or at least deprecating) the in-built behavior. It's very hard to specify the behavior of something like DynamicsCompressorNode precisely enough that people can count on consistent behavior across browsers. To me, that's a sign perhaps it shouldn't be in the spec.<p>Disclaimer: I've worked on some of this stuff, and have been playing with a port of my DX7 emulator to emscripten. Opinions are my own and not that of my employer.
For reference : <a href="https://www.audiotool.com/app" rel="nofollow">https://www.audiotool.com/app</a> ( using Flash ).<p>You just can't do that with the same level of tightness of rhythm on low hardware with web techs today. Flash was bad yet Flash also opened up insane possibilities on the web when it comes to multimedia applications that just can't be matched with Webtechs. ASM.js might fill the gap, but i haven't seen any equivalent yet.
I briefly tried Web Audio to implement a Karplus-Strong synthesizer (about the simplest thing in audio synthesis I guess?).<p>Without using ScriptProcessorNode, there was no way of tuning the synthesizer because of the limitation that any loop in the audio graph has a 128 samples delay at least.<p>Maybe a more "compilation-oriented" handling of the audio graphs (at the user's choice) could help overcome this?
Now step back and honestly think about which web API is actually powerful and nice to use and makes the impression that it has been carefully crafted by a domain expert.<p>I cannot think of one.
Question: is the "point" of Web Audio to expose the native hardware-accelerated functionality of the underlying audio controller, through a combination of the OS audio driver + shims? Or is it more an attempt to implement everything in userspace, in a way equivalent to any random C++ DSP graph library? I've always thought it was the former.
Web api standardization for VR/AR is <i>currently</i> a work in progress. And it's been... less than pretty.<p>So if you've been wanting to try some intervention to make web standards less poor, or just want to observe how they end up the way they do, here's an opportunity.
>you can’t directly draw DOM elements to a canvas without awkwardly porting it to an SVG<p>This is not a wart, this is a security feature. Of course, it wouldn't be a necessary limitation if the web wasn't so complicated, but the web is complicated.
Kinda tagential to the thread, but what's the best book for an introduction to audio programming for an experienced, language agnostic coder (java, c, c++, obj-c, etc)?
><i>It [WebGL] gives you raw access to the GPU ...</i><p>Not to be semantic, but that's technically incorrect. Indeed, if WebGL were to be supplanted by a lower-level graphics API, that would make a lot of people happy.[0]<p>As far as the author's thesis concerning the Web Audio API: I agree that it's a total piece of shit.<p>[0] <a href="https://news.ycombinator.com/item?id=14930824" rel="nofollow">https://news.ycombinator.com/item?id=14930824</a>
One word: w3c.<p>I've said it before, I'll say it again: it exists in a vacuum, and is run by people who have never done any significant work on the web, with titles like "Senior Specifications Specialist". Huge chunks of their work is hugely theoretical (note: not academical, just theoretical) and have no bearing on the real world.
I disagree with a lot of the assertions in this blog. You have to suspend some of your expectations since this is all .js. you can't have a js loop feeding single samples to a buffer. Js isn't deterministic to that level of granularity, but overall it's fast enough to generate procedural audio in chunks if you manage the timing. If you check out some of the three.js 3d audio demos you can see some pretty cool stuff being done with all those crazy nodes the auThor is decrying. He'll I wrote a tron game and did the audio using audio node chains and managed to get something really close to the real tron cycle sounds, without resorting to sample level tweaking.. and with > 16 bikes emitting procedural audio.. I think more focus on the strengths than weaknesses is in order.. and if you really want to peg your cpu.. you can still use emscripten/webasm or similar to generate buffers, if that's your thing..
The major problem of this API is that they couldn't just copy something designed by people with actual knowledge, as in WebGL. So it was design by committee that does so much the application should handle but has so deficient core capabilities no application can rectify any of it.
The Web Audio API is designed for web developers who would want to integrate sound into their web apps. Notifications, etc.<p>That pre-browser era where we would have sounds for everything. Minimize window, user logged in, logged out, all that crap.<p>Also the API has good support for visual. Spectrum analysis. This is pretty good for an education course to offer for beginners on sound processing.<p>I wouldn't use it for anything serious like a DAW.