Discrete Cosine Transform in Video Compression – Explain Like I'm Five

121 点作者 ponderingfish将近 5 年前

18 条评论

JackFr将近 5 年前

Maybe I'm dense, but I didn't find this useful at all. The author spends the bulk of the article explaining what transform means (not typically the sticking point) and then uses terms like "pixel-domain", "frequency-domain", "decorrelating", "energy compacting" without explanation or definition. The author could spend a little less time on the former and more on the latter IMHO.

评论 #24282448 未加载

评论 #24282664 未加载

cdavid将近 5 年前

I am not convinced it is eli5. I am too lazy to write a blog post w/ illustration, but for audio signals, which I am more familiar with, the intuition behind DCT (and MDCT, used e.g. for mp3) is straightforward.Assuming you understand that a Fourier transform is an operation to go from from time domain to the frequency domain, the problem solved by DCT, DST, etc. is related to the fact that digital signal processing are finite, and without any care, you introduce 'irregularities' if you use a 'normal' Fourier transform.So the main idea of DCT/DST/etc. is to implicitly 'copy' and/or mirror the signal, to reduce the artefacts/irregularities introduced by Fourier Transform. Reducing irregularities intuitively leads to more regular signals, and the more regular your signal, the quicker the high frequencies decrease, which is the compression effect of DCT.More mathematically, but still very informally: DCT/DST is about boundary conditions. Using DFT (the 'normal' Fourier transform for digital signals) will imply discontinuities at the boundaries. For continuous time signals, an intuitive way to define regularities is to measure the decay of successive derivative of a function f(t), by looking at convergence convergence of t^n f(t) as t -> inf for n. That implies that regular functions have bounded Fourier transforms, and the more regular, the faster the Fourier transform decays.The DST/DCT, by mirroring/copying the signal, reduce irregularities, and hence their coefficients decrease faster.

评论 #24283104 未加载

评论 #24283462 未加载

评论 #24283127 未加载

评论 #24283079 未加载

Diti将近 5 年前

This is DEFINITELY NOT explained like I’m five. More like late high school level (mathematical planes, frequencies) and above (matrixes). I appreciate the author’s attempt at vulgarization but the title is wrong.

评论 #24282078 未加载

评论 #24282070 未加载

评论 #24282131 未加载

walagran将近 5 年前

Pretty cool! I would add Computerphile's Youtube Video as a watch after this: <a href="https://www.youtube.com/watch?v=Q2aEzeMDHMA" rel="nofollow">https://www.youtube.com/watch?v=Q2aEzeMDHMA</a>. This is a part of their 4-video series on JPEG:<a href="https://www.youtube.com/playlist?list=PLQfOC23r609kmgOr_V8sfUnr0r3pOI9gT" rel="nofollow">https://www.youtube.com/playlist?list=PLQfOC23r609kmgOr_V8sf...</a>

评论 #24290944 未加载

EForEndeavour将近 5 年前

The main reason why this article wasn't intuitive to me is that the toy example is trivial, and the next example (a real image) is very nontrivial.The first example shows through three lines of runnable Matlab code (sadly, I don't have Matlab and it's not free, but whatever) that if you start with an 8-by-8 matrix of 255 and run the discrete cosine transform, you end up with a sparse matrix with just one nonzero value in the first entry, because "the DCT has compacted the energy of the matrix into the first element referred to as the DC coefficient. The rest of the coefficients are called the AC coefficients."Cool? This doesn't help any more than just telling me that sentence directly. Also, the article doesn't even expand on this fingerhold of familiarity: DC and AC? Like direct current and alternating current, so that first element is in some sense the zero-frequency ("direct current") term, and if the starting example had any variation at all beyond a perfectly uniform field of 255's, you'd start to see "energy" showing up in the "alternating-current" entries? That would start to give some intuition.I guess what I'm saying is this article works, but it works by frustrating the reader just enough that they modify and play with the code (porting it from Matlab if they have to), thereby gaining the intuitive understanding that the article promises.

评论 #24283503 未加载

ssawyer06将近 5 年前

Being familiar with the DCT but having no idea how to explain to a five year-old, I clicked. This does not ELI5 the DCT.

评论 #24282909 未加载

评论 #24282796 未加载

djmips将近 5 年前

If this stuff intrigues you please try out Steve Brunton's extensive set of videos on Data Science that include superb lectures on Fourier Series, the Fourier Transform and the Fast Fourier Transform with examples in Matlab and Python. Can't recommend this guy enough. <a href="https://www.youtube.com/channel/UCm5mt-A4w61lknZ9lCsZtBw" rel="nofollow">https://www.youtube.com/channel/UCm5mt-A4w61lknZ9lCsZtBw</a>

justjonathan将近 5 年前

This is an explanation (from the amazing three blue one brown) of the DFT not the DCT, but this was the first thing that ever really made sense to me. It is explained by someone who really knows math incredibly well and is an amazing teacher, with amazing visuals: <a href="https://www.youtube.com/watch?v=spUNpyF58BY" rel="nofollow">https://www.youtube.com/watch?v=spUNpyF58BY</a>

cevans01将近 5 年前

For those that have already worked with the FFT a bit, there are ways to use the FFT to calculate a DCT:<a href="https://dsp.stackexchange.com/questions/2807/fast-cosine-transform-via-fft" rel="nofollow">https://dsp.stackexchange.com/questions/2807/fast-cosine-tra...</a>

miccah将近 5 年前

Not DCT, but this website [1] has been on my TODO list for awhile. It explains DFT using interactive visualizations.[1] <a href="https://jackschaedler.github.io/circles-sines-signals/index.html" rel="nofollow">https://jackschaedler.github.io/circles-sines-signals/index....</a>

评论 #24282393 未加载

golergka将近 5 年前

> In simple terms, the Discrete Cosine Transform takes a set of N correlated (similar) data-points and returns N de-correlated (dis-similar) data-points (coefficients) in such a way that the energy is compacted in only a few of the coefficients M where M << N.That's not simple terms.

hinkley将近 5 年前

3Blue1Brown ELI15 the Fast Fourier Transform, which sets up the problem domain:<a href="https://www.youtube.com/watch?v=spUNpyF58BY" rel="nofollow">https://www.youtube.com/watch?v=spUNpyF58BY</a>

anovikov将近 5 年前

Good description of how you use it for image compression, but nothing about video compression - that is, how similarity between consecutive frames is used.

rrao84将近 5 年前

Fantastic explanation of transforms and the example of 20 questions and DCT totally did it for me. The rest of the explanation is hard to grok if aren't a programmer or not interested in image & video compression. YMMV

master_yoda_1将近 5 年前

Looks like everyone at hackernews have intelligence of a five year old :) Sad to see for so many human being their brain never evolve.

andi999将近 5 年前

Actually even if you are not 5 this totally misses the point about the DCT. Especially the AC components are not explained.

zackmorris将近 5 年前

Here's the page that got me interested in the DCT 15 years ago:<a href="https://www.mathworks.com/help/images/discrete-cosine-transform.html" rel="nofollow">https://www.mathworks.com/help/images/discrete-cosine-transf...</a>Although I must admit that I haven't fully internalized it (it's easy to forget how it works). It might help to learn some of the more mainstream ones like the Fourier transform first:<a href="https://www.mathworks.com/help/images/fourier-transform.html" rel="nofollow">https://www.mathworks.com/help/images/fourier-transform.html</a><a href="https://www.mathworks.com/help/images/image-transforms.html" rel="nofollow">https://www.mathworks.com/help/images/image-transforms.html</a>MATLAB (or GNU Octave which is free) is the only language I've used that maps abstractions to code in close to a 1:1 fashion. I think of code bloat as roughly these orders of magnitude:Math/Matrix languages (MATLAB) 1:1Scripting languages (JavaScript, PHP, Python, etc) 10:1Bare metal languages (Rust, C++, Assembly) 100:1Unfortunately I have yet to find a functional programming language that is 1:1 in the same way that MATLAB is. One would think that Lisp/Haskell/Scala/Julia/Clojure/Erlang would be concise. But in practice, I've found them to generally be write-only languages that are quite difficult to grok. The only thing that comes close is spreadsheets, but due to their 2D nature, they can't really scale beyond a certain level of complexity. Honestly, I think that the lack of adoption of functional programming languages (due to their own obstinance) is one of the great problems of our time.Anyway, for the past 5-10 years or so, I solve most problems in my head in a data-driven way that looks like a hybrid between MATLAB and spreadsheets, piping data between classes written in an Actor model style (I learned this from a very good teacher during a contract at HP). I use rules of thumb like no mutable data (stored state), mostly higher-order functions, and framing concepts in terms of transformations rather than class inheritance. Then I translate that to whatever language I have to use for the client, which is probably PHP, JavaScript or C# (Unity).I highly recommend this sort of approach for keeping abstractions separate from implementations. Unfortunately since nobody else seems to follow this method, it makes for some friction in the workplace. People talk past me all of the time because they don't realize that I'm coming from this place of experience. They don't see the need for this clean separation when they are "trying to get things done". They may get things done faster, but if you want bug-free implementations, this is the way to go IMHO.Edit: I'm on the fence about TensorFlow. It's arguably more advanced than MATLAB, but vastly more opaque. So I'm not sold on any advantage it provides, other than perhaps better performance or usefulness in certain domains like machine learning. Oh and any perceived lack of performance with MATLAB is an implementation detail. I can't understand why it's not fully parallelized and running on the GPU by now.

rsync将近 5 年前

"Explain Like I'm Five"Oh my god - are you all right ? Where are your parents ?