Always love a good bellard ship.<p>This is definitely better than some of the others out there. I threw together some comparisons here at 7kb/s for mp3/opus/aac: <a href="https://non.io/TSAC-Comparisons" rel="nofollow">https://non.io/TSAC-Comparisons</a><p>Happy to add other comparisons if others want any.<p>Overall, it's FAR better at these lower bit rates, but that doesn't mean it's necessarily good. One issue I see off the bat is that volume is fairly inconsistent in the output for TSAC, which makes stereo in particular quite hard to listen to with the volume "flickering" in each channel independently.
if you patch out the CRC check in the binary with<p>echo -ne "\x90\x90" | dd if=/dev/stdin of=tsac bs=1 seek=23914 conv=notrunc<p>you can corrupt the compressed files with very interesting results: <a href="https://meow.social/@mimir/112238998609778334" rel="nofollow">https://meow.social/@mimir/112238998609778334</a><p>the fast mode (you don't have to patch the binary for this one, it seems to not do the CRC check?) and the normal (non-fast) mode sound different, but both quite interesting
This doesn't have much of a use case.<p>- Can't use it in telephony (obvious application for low bitrates); phone handsets and headsets don't have the power to do it in real time.<p>- Very small files of good quality would be useful in tiny embedded systems that have low flash space: but what systems of that type have the processing power for decoding? Very low storage more or less goes hand in hand with weak processing.<p>The quality is astonishing for the bit rate, though.
Clicked the download link wanting to take a look at the source... and was a bit perplexed before quickly canceling it. 237MB, <i>compressed</i>, for an audio codec!? At that point one can't help but think that the samples are already in the decoder itself.<p>I wonder how it compares to <a href="https://en.wikipedia.org/wiki/Codec2" rel="nofollow">https://en.wikipedia.org/wiki/Codec2</a> and related codecs, which go even lower for bitrate.
<i>An Nvidia GPU is necessary for fast operation.</i><p>Compression is getting so heavy that soon it isn't possible to perform it on normal hardware. AV1 already proved that, the future audio/video codecs will be even heavier.<p>Decompression is also getting heavier. Poor mobile devices.<p>I'm starting to appreciate well written algorithms which don't require massive computing power. JPEG XL is a good example. It has the same compression ratio as AVIF, but requires less processing power.
One of the DAC authors here (the codec that this builds off of). Very cool work! Would love to see some more detail on the modifications to DAC. Boosting the capacity with a transformer makes sense to me.<p>Makes me happy to see DAC getting built on! Thanks!
I might be missing something obvious, but it's not clear to me how to get an mp3 out of this on Ubuntu 22.04.<p>Following the docs, `./tsac c myfile.mp3 myfile.tsac` generates a tsac file that's unplayable with mpv. Trying ffmpeg to convert to mp3 didn't work: `ffmpeg -i myfile.tsac compressed.mp3` ("myfile.tsac: Invalid data found when processing input"). Using a wav input file has the same result.<p>I can use `./tsac d myfile.tsac output.wav` (I don't really want to decompress anything, but worth a try) but then after compressing `output.wav` with `ffmpeg -i output.wav output.mp3`, output.mp3 is the same size as if I hadn't used tsac (of course). If I use ffmpeg with a low bitrate like `-b:a 16k`, I get the usual low-quality gargle rather than the tsac output.
FYI (and in case Mr Bellard is reading), for the "Greatest Love of All" demo, the sample labeled "mono 5.02 kb/s" is in fact linked to the 6.79 kb/s stereo sample. The correct file is available at <a href="https://bellard.org/tsac/Greatest_Love_mono.wav" rel="nofollow">https://bellard.org/tsac/Greatest_Love_mono.wav</a>
This is quite similar to the models used by all the AI music generators. Some feed the tokens into a language model to generate music, some replace the tokenization part with an alternative that gives a continuous representation for diffusion models.
New advancement of media compression seems always focusing on low bitrate, be it audio, video or image.<p>Which is totally fair given their applications, but I always wonder how much improvement they bring in high bitrate scenario. For example, are there codecs that have much better (perceptible) quality than Apple AAC 256kbps (or achieving similar quality at, say, 160kbps?) How much better are AV1 at 10Mbps compared to H265/264 (the improvement of H265 compared to H264 in "transparent" encoding was pretty disappointing IMHO).
> The Transformer model is evaluated in a deterministic and reproducible way. Hence the result does not depend on the exact GPU or CPU model nor on the number of configured threads. This key point ensures that a compressed file can be decompressed using a different hardware or software configuration.<p>How is this possible? Does it use floating point and concurrency?<p>Cross-platform floating point determinism is seriously difficult. The Rapier physics engine could do it [0] at the expense of disabling simd and multithreading. It also works only on platforms that strictly comply to IEEE 754-2008 which I think that GPUs usually don't qualify (regarding subnormal numbers etc). Another thing that may have issues is fused multiply-add which may give higher precision than doing multiplication and addition separately (I think some platforms don't have FMA in hardware)<p>For example, it seems that TSAC currently runs on CPUs and nvidia GPUs. Could porting to AMD GPUs affect determinism?<p>[0] <a href="https://rapier.rs/docs/user_guides/rust/determinism/" rel="nofollow">https://rapier.rs/docs/user_guides/rust/determinism/</a>
I attempted some ML-as-'compression' experiments ~2 years ago, ended up hitting a wall. Check out samples/pitch here: <a href="https://lorinhalpert.com/ipoc/ala/" rel="nofollow">https://lorinhalpert.com/ipoc/ala/</a><p>If someone has audio encoding, playback, and/or DSPs experience email me to be invited to our our Discord server so we can take another crack at it! :)
This appears to sit just in the middle of something that could be used for music, with a higher bit rate, still much lower than other competing codecs, and something very effective for voice communication, but shrinking the bandwidth (thus the bit rate) also to limit artifacts. Not an expert in the field, anyway I think the supplied examples aren't the best ones to show its potential.
Reading this I was wondering how far are video compression (with transformers), turns out decoding is still too expensive in practice (under 10 FPS for 1080p video).<p><a href="https://arxiv.org/abs/2206.07307" rel="nofollow">https://arxiv.org/abs/2206.07307</a>
<a href="https://arxiv.org/abs/2210.13827" rel="nofollow">https://arxiv.org/abs/2210.13827</a>
FYI it looks to be MIT/BSD license.<p>Separately:<p>>The Transformer model is evaluated in a deterministic and reproducible way. Hence the result does not depend on the exact GPU or CPU model nor on the number of configured threads.<p>That's neat. So even though it's "AI-based" its output is guaranteed to be the same for a given input?
I wonder where this codec is on complexity / bitrate graph from this post - <a href="https://phoboslab.org/log/2023/02/qoa-time-domain-audio-compression" rel="nofollow">https://phoboslab.org/log/2023/02/qoa-time-domain-audio-comp...</a>
Pretty good! EnCodec also comes to mind as a neural codec: <a href="https://ai.honu.io/papers/encodec/samples.html" rel="nofollow">https://ai.honu.io/papers/encodec/samples.html</a>
Reminds me of the old IBM 'RECOVC' codec from around 2000 where they compressed Mel-bank speech.<p><a href="https://ieeexplore.ieee.org/document/7075313" rel="nofollow">https://ieeexplore.ieee.org/document/7075313</a><p>All the patents around that are long-dead so good time to do an updated version I guess.<p>If you wanted to do something similar but with way lower bitrates (e.g. 300bps), then look at the NRV codec:<p><a href="https://www.researchgate.net/publication/224209493_300_bps_noise_robust_vocoder" rel="nofollow">https://www.researchgate.net/publication/224209493_300_bps_n...</a>
Bellard strikes again...<p>Are we almost converting music to MIDI at this point?<p>As I understand it the model is learning the landscape of sound combinations that are interesting to humans and as such there will be no combination of raw bytes in the recorded file that will result in white noise (for example) being heard because this is never trained for.<p>What if it was though?