TSAC: Low Bitrate Audio Compression

236 pointsby ajitkabout 1 year ago

25 comments

jjcmabout 1 year ago

Always love a good bellard ship.This is definitely better than some of the others out there. I threw together some comparisons here at 7kb/s for mp3/opus/aac: <a href="https://non.io/TSAC-Comparisons" rel="nofollow">https://non.io/TSAC-Comparisons</a>Happy to add other comparisons if others want any.Overall, it's FAR better at these lower bit rates, but that doesn't mean it's necessarily good. One issue I see off the bat is that volume is fairly inconsistent in the output for TSAC, which makes stereo in particular quite hard to listen to with the volume "flickering" in each channel independently.

评论 #39976724 未加载

评论 #39979877 未加载

评论 #39977611 未加载

评论 #39982446 未加载

评论 #39981433 未加载

评论 #39978412 未加载

XMPPwockyabout 1 year ago

if you patch out the CRC check in the binary withecho -ne "\x90\x90" | dd if=/dev/stdin of=tsac bs=1 seek=23914 conv=notruncyou can corrupt the compressed files with very interesting results: <a href="https://meow.social/@mimir/112238998609778334" rel="nofollow">https://meow.social/@mimir/112238998609778334</a>the fast mode (you don't have to patch the binary for this one, it seems to not do the CRC check?) and the normal (non-fast) mode sound different, but both quite interesting

评论 #39976358 未加载

评论 #39976036 未加载

评论 #39981240 未加载

评论 #39976759 未加载

kazinatorabout 1 year ago

This doesn't have much of a use case.- Can't use it in telephony (obvious application for low bitrates); phone handsets and headsets don't have the power to do it in real time.- Very small files of good quality would be useful in tiny embedded systems that have low flash space: but what systems of that type have the processing power for decoding? Very low storage more or less goes hand in hand with weak processing.The quality is astonishing for the bit rate, though.

评论 #39977439 未加载

评论 #39978025 未加载

评论 #39977718 未加载

评论 #39977481 未加载

评论 #39976894 未加载

评论 #39976995 未加载

评论 #39976889 未加载

评论 #39977168 未加载

评论 #39977142 未加载

评论 #39977066 未加载

评论 #39978613 未加载

userbinatorabout 1 year ago

Clicked the download link wanting to take a look at the source... and was a bit perplexed before quickly canceling it. 237MB, compressed, for an audio codec!? At that point one can't help but think that the samples are already in the decoder itself.I wonder how it compares to <a href="https://en.wikipedia.org/wiki/Codec2" rel="nofollow">https://en.wikipedia.org/wiki/Codec2</a> and related codecs, which go even lower for bitrate.

评论 #39976375 未加载

评论 #39976238 未加载

评论 #39976267 未加载

评论 #39976936 未加载

评论 #39980847 未加载

gardaaniabout 1 year ago

An Nvidia GPU is necessary for fast operation.Compression is getting so heavy that soon it isn't possible to perform it on normal hardware. AV1 already proved that, the future audio/video codecs will be even heavier.Decompression is also getting heavier. Poor mobile devices.I'm starting to appreciate well written algorithms which don't require massive computing power. JPEG XL is a good example. It has the same compression ratio as AVIF, but requires less processing power.

评论 #39976693 未加载

评论 #39983532 未加载

pseethabout 1 year ago

One of the DAC authors here (the codec that this builds off of). Very cool work! Would love to see some more detail on the modifications to DAC. Boosting the capacity with a transformer makes sense to me.Makes me happy to see DAC getting built on! Thanks!

评论 #39983388 未加载

ggorlenabout 1 year ago

I might be missing something obvious, but it's not clear to me how to get an mp3 out of this on Ubuntu 22.04.Following the docs, `./tsac c myfile.mp3 myfile.tsac` generates a tsac file that's unplayable with mpv. Trying ffmpeg to convert to mp3 didn't work: `ffmpeg -i myfile.tsac compressed.mp3` ("myfile.tsac: Invalid data found when processing input"). Using a wav input file has the same result.I can use `./tsac d myfile.tsac output.wav` (I don't really want to decompress anything, but worth a try) but then after compressing `output.wav` with `ffmpeg -i output.wav output.mp3`, output.mp3 is the same size as if I hadn't used tsac (of course). If I use ffmpeg with a low bitrate like `-b:a 16k`, I get the usual low-quality gargle rather than the tsac output.

zowaabout 1 year ago

FYI (and in case Mr Bellard is reading), for the "Greatest Love of All" demo, the sample labeled "mono 5.02 kb/s" is in fact linked to the 6.79 kb/s stereo sample. The correct file is available at <a href="https://bellard.org/tsac/Greatest_Love_mono.wav" rel="nofollow">https://bellard.org/tsac/Greatest_Love_mono.wav</a>

zaptremabout 1 year ago

This is quite similar to the models used by all the AI music generators. Some feed the tokens into a language model to generate music, some replace the tokenization part with an alternative that gives a continuous representation for diffusion models.

thrdbndndnabout 1 year ago

New advancement of media compression seems always focusing on low bitrate, be it audio, video or image.Which is totally fair given their applications, but I always wonder how much improvement they bring in high bitrate scenario. For example, are there codecs that have much better (perceptible) quality than Apple AAC 256kbps (or achieving similar quality at, say, 160kbps?) How much better are AV1 at 10Mbps compared to H265/264 (the improvement of H265 compared to H264 in "transparent" encoding was pretty disappointing IMHO).

评论 #39976809 未加载

评论 #39977089 未加载

nextaccounticabout 1 year ago

> The Transformer model is evaluated in a deterministic and reproducible way. Hence the result does not depend on the exact GPU or CPU model nor on the number of configured threads. This key point ensures that a compressed file can be decompressed using a different hardware or software configuration.How is this possible? Does it use floating point and concurrency?Cross-platform floating point determinism is seriously difficult. The Rapier physics engine could do it [0] at the expense of disabling simd and multithreading. It also works only on platforms that strictly comply to IEEE 754-2008 which I think that GPUs usually don't qualify (regarding subnormal numbers etc). Another thing that may have issues is fused multiply-add which may give higher precision than doing multiplication and addition separately (I think some platforms don't have FMA in hardware)For example, it seems that TSAC currently runs on CPUs and nvidia GPUs. Could porting to AMD GPUs affect determinism?[0] <a href="https://rapier.rs/docs/user_guides/rust/determinism/" rel="nofollow">https://rapier.rs/docs/user_guides/rust/determinism/</a>

评论 #39979212 未加载

Lorinabout 1 year ago

I attempted some ML-as-'compression' experiments ~2 years ago, ended up hitting a wall. Check out samples/pitch here: <a href="https://lorinhalpert.com/ipoc/ala/" rel="nofollow">https://lorinhalpert.com/ipoc/ala/</a>If someone has audio encoding, playback, and/or DSPs experience email me to be invited to our our Discord server so we can take another crack at it! :)

bheadmasterabout 1 year ago

So, let me get this straight.Using a ~300 MB model, on a 1 TB hard drive, at 8 Kb/s, we can store... ~30 years of music.

squarefootabout 1 year ago

This appears to sit just in the middle of something that could be used for music, with a higher bit rate, still much lower than other competing codecs, and something very effective for voice communication, but shrinking the bandwidth (thus the bit rate) also to limit artifacts. Not an expert in the field, anyway I think the supplied examples aren't the best ones to show its potential.

antisthenesabout 1 year ago

Finally, a bit-rate where I can tell the difference between compressed and original!

greenavocadoabout 1 year ago

Next step: use a 1000B LMM (Large Music Model) trained on 1000+ TB of music for zero shot retrieval of any possible sound

metalrainabout 1 year ago

Reading this I was wondering how far are video compression (with transformers), turns out decoding is still too expensive in practice (under 10 FPS for 1080p video).<a href="https://arxiv.org/abs/2206.07307" rel="nofollow">https://arxiv.org/abs/2206.07307</a> <a href="https://arxiv.org/abs/2210.13827" rel="nofollow">https://arxiv.org/abs/2210.13827</a>

unethical_banabout 1 year ago

FYI it looks to be MIT/BSD license.Separately:>The Transformer model is evaluated in a deterministic and reproducible way. Hence the result does not depend on the exact GPU or CPU model nor on the number of configured threads.That's neat. So even though it's "AI-based" its output is guaranteed to be the same for a given input?

severak_czabout 1 year ago

I wonder where this codec is on complexity / bitrate graph from this post - <a href="https://phoboslab.org/log/2023/02/qoa-time-domain-audio-compression" rel="nofollow">https://phoboslab.org/log/2023/02/qoa-time-domain-audio-comp...</a>

steeveabout 1 year ago

Pretty good! EnCodec also comes to mind as a neural codec: <a href="https://ai.honu.io/papers/encodec/samples.html" rel="nofollow">https://ai.honu.io/papers/encodec/samples.html</a>

altairprimeabout 1 year ago

Are there standard-ish codec comparison processes that we can run to see how much perceived fidelity is lost in compression here?

评论 #39975883 未加载

briansmabout 1 year ago

Reminds me of the old IBM 'RECOVC' codec from around 2000 where they compressed Mel-bank speech.<a href="https://ieeexplore.ieee.org/document/7075313" rel="nofollow">https://ieeexplore.ieee.org/document/7075313</a>All the patents around that are long-dead so good time to do an updated version I guess.If you wanted to do something similar but with way lower bitrates (e.g. 300bps), then look at the NRV codec:<a href="https://www.researchgate.net/publication/224209493_300_bps_noise_robust_vocoder" rel="nofollow">https://www.researchgate.net/publication/224209493_300_bps_n...</a>

评论 #39990221 未加载

shmerlabout 1 year ago

What's Nvidia specific about it?

cjdellabout 1 year ago

Bellard strikes again...Are we almost converting music to MIDI at this point?As I understand it the model is learning the landscape of sound combinations that are interesting to humans and as such there will be no combination of raw bytes in the recorded file that will result in white noise (for example) being heard because this is never trained for.What if it was though?

评论 #39979564 未加载

Dweditabout 1 year ago

How well does it work on any song outside of its training set?

评论 #39978009 未加载