To me, the key innovation here is the tight integration between network conditions and codec frame size. Standard codecs are created with specific bandwidth requirements and they provide encoded frames that 'average' around that size. You <i>could</i> just re-initialize a codec at a lower bandwidth on the fly, but you would have to send an I frame (large full frame) to kickoff the new series of frames (as video most video frames are just updates of a previous frame). Having a codec accept a bandwidth target per frame is a really good idea.
I'm fairly sure that Ben Orenstein and friend are forming a company to commercialise this as a Screenhero replacement. Discussed on this podcast: <a href="http://artofproductpodcast.com/episode-39" rel="nofollow">http://artofproductpodcast.com/episode-39</a><p>Very interested to see what they cook up (and kinda envious I didn't have the idea / don't have the space in my life to have a crack at it myself---it sounds very interesting).
I've been taking Financial Markets course by Robert Shiller and he continuously makes the point when talking about inventions and new ideas that "it's crazy to me that this didn't exist before". It's usually the sign of a really good invention when you have that thought. And that's the thought I'm having looking at this combining the codec and transport protocol together: "Why hasn't this been done before?" == "This is awesome!"
A bigger frustration I experience is that some streaming seems to just "give up"; stalling and never resuming. I know the connection and server are okay because I can usually force it to resume manually, e.g. doing a page refresh, so is it just bad server architecture or a codec problem?
Um ... from the paper ...<p>"6.1 Limitations of Salsify<p>No audio. Salsify does not encode or transmit audio."<p>Claiming that you beat a bunch of codecs that have synchronized audio (even though they disable it) is kind of misleading ...
Slighty tangential...<p><i>Salsify is led by Sadjad Fouladi, a doctoral student in computer science at Stanford University, along with fellow Stanford students John Emmons, Emre Orbay, and Riad S. Wahby, as well as Catherine Wu, a junior at Saratoga High School in Saratoga, California. The project is advised by Keith Winstein, an assistant professor of computer science.<p>Salsify was funded the National Science Foundation and the Defense Advanced Research Projects Agency (DARPA). Salsify has also received support from Google, Huawei, VMware, Dropbox, Facebook, and the Stanford Platform Lab.</i><p>Financially supported by the government, tech juggernauts, and executed by top tier doctoral students + a high school student + a top tier university professor.<p>Assuming this could be game-changing innovation to further advance worldwide communication, it's refreshing to see the positive externalities of a combination of capitalistic (F500 tech co's) and socialistic (university, government) systems executed by a seemingly diverse set of actors.
(Disclaimer: This comment is my personal opinion, not that of my employer.)<p>Really exciting work.<p>Encoding multiple versions of a video and picking a smaller one in response to congestion already happens for video-on-demand (think YouTube and Netflix videos) in DASH. That said, with VOD you can encode the video slower than real-time.<p>I can't imagine this ever making it into Skype/FaceTime/Hangouts/Duo. The big corps will probably continue to focus on "more internet" (fiber optic, zero rating, wi-fi hotspots, and internet traffic management practices).
It's 2018 and I still have many dropped calls and other weird stuff when I talk with people on my mobile. FaceTime Audio is often a good alternative but still not perfect. So, I really hope the audio version of this will be commercialized soon.
Unfortunately this would only apply to one-on-one low latency video chats. For streaming to an audience, which generally uses a distribution network between the user and the video source to help handle load and geographical distribution, the CDN itself has no influence on video encoding. The CDN would need to jump in and do this back-and-forth negotiation and delivery of lower quality frames, which it is not currently suited for. I'd love to see it come about, but it's not just the codecs we need to look at for adoption beyond point-to-point video calls.
There's a vegetable named "salsify", very yummy. <a href="https://duckduckgo.com/?q=salsify+vegetable&t=ffab&ia=recipes" rel="nofollow">https://duckduckgo.com/?q=salsify+vegetable&t=ffab&ia=recipe...</a>
Barely related to this, but looking at the results (section 5.2) I'm amazed at how much worse T-Mobile is for latency. AT&T and Verizon both give about 2 s of delay for Hangouts, while T-Mobile gives 7 s of delay.
> What would you say to tomorrow’s codec implementers?<p>> Standardize an interface to export and import the encoder’s and decoder’s internal state between frames!<p>Can't this be achieved using sandboxing/emulation/VM techniques?
Another recent discussion was <a href="https://news.ycombinator.com/item?id=16802079" rel="nofollow">https://news.ycombinator.com/item?id=16802079</a>.
Kudos for making things accessible. However, joint source-channel coding is not news, especially at the level of scalable video coding (probably 20-year-old research by this point). In academia this isn't as exciting as it sounds to industry.
"Is this a startup company?<p>No.<p>Are you sure? Your website looks like a startup company’s.<p>It's just the HTML template! They all look like this. [...]"<p>Brilliant