Does stuff like this help anyone?<p>I still haven’t forgiven CGP Grey for changing the title to his 2017 ML video to “How AIs, like ChatGPT, learn”. The video is about genetic algorithms and has nothing to do with ChatGPT. (or with anything else in modern AI)
I have met people who think they understand a particular topic I am versed in, but actually don't. Similarly, I am often wary that I get superficial knowledge about a topic I don't know much about through "laymen" resources, and I doubt one can have an appropriate level of understanding mainly through analogies and metaphors. It's a kind of "epistemic anxiety". Of course, there are "laymen" books I stumbled upon which I think go to appropriate levels of depth and do not "dumb down" to shallow levels the topics, yet remain accessible, like Gödel's Proof, by Ernest Nagel. I'd be glad to read about similar books on all topics, including the one discussed in this thread.<p>Knowledge is hard to attain...
> This is how neural networks work: they see many examples and get rewarded or punished based on whether their guesses are correct.<p>This description more closely describes reinforcement learning, rather than gradient based optimization.<p>In fact, the entire metaphor of a confused individual being slapped or rewarded without understanding what's going on doesn't really make sense when considering gradient optimization because the gradient wrt the to loss function tells the network <i>exactly</i> how to change it's behavior to improve it's performance.<p>This last point is incredibly important to understand correctly since it contains one of the biggest assumptions about network behavior: that the optimal solution, or at least good enough for our concerns solution, <i>can be found</i> by slowing taking small steps in the right direction.<p>Neural networks are great at <i>refining</i> their beliefs but have a difficult time radically changing them. A better analogy might be trying to very slowly convince your uncle that climate change is real, and not a liberal conspiracy.<p>edit: it also does a poor job of explaining layers, which reads much more similar to how ensemble methods work (lots of little classifiers voting) than how deep networks work.
If anyone is looking for a quick overview of how LLMs are built, I highly recommend this video by Steve Seitz: <a href="https://www.youtube.com/watch?v=lnA9DMvHtfI">https://www.youtube.com/watch?v=lnA9DMvHtfI</a>.<p>It does an excellent job of taking you from 0 to a decent understanding without dumbing down the content or abusing analogies.
I have a few funny analogies that I think kind of work.<p>1. "gradient descent" is like tuning a guitar by ear and listening to the beat frequencies ("loss") and then decreasing these by tuning a string up or down.<p>2. the best I can come up with for "backpropagation" is to imagine a clever device that can tirelessly optimize a Rube Goldberg machine for you but as a science, not an art.
I love this, but Im always confused in these kinds of analogies what the reward / punishment system really equates to...<p>Also reminds me of Ted Chiang warning us that we will torture innumerable AI entities long before we start having real conversations about treating them with compassion.
Totally aware that this isn't a fully formal definition of deep learning, but one interesting takeaway for me is realizing that in a way, corporations with their formal and informal reporting structures are structured in a way similar to neural networks too.<p>It seems like these sort of structures just regularly arise to help regulate the flow of information through a system.
The problem with deep learning is opposite. You can understand most of it with just high school math. Advanced math is mostly useless because of the dimensionality of neural nets.
I asked ChatGPT:<p>Sure, I can explain deep learning using the analogy of mixing paint colors.<p>When you mix different colors of paint, you can create new colors that are a combination of the original colors. Similarly, in deep learning, you can combine different types of data, such as images, text, and audio, to create a more comprehensive understanding of the information.<p>In a neural network, which is the basic building block of deep learning, each layer is like a different color of paint. The input layer represents the initial data, which is like a blank canvas. As the data passes through each layer, the network extracts increasingly complex features, similar to how mixing different colors creates more complex hues.<p>Each layer in a neural network can be thought of as a different color that is mixed with the previous layer's color. As the colors mix, the resulting color becomes more complex and nuanced, just as the data becomes more sophisticated and detailed as it passes through the layers of the neural network.<p>At the end of the process, the output layer represents the final result, which is like a finished painting. The deep learning algorithm can then use this final output to make predictions or classifications based on the input data.<p>So, in essence, deep learning is like mixing paint colors to create new and more complex hues, but instead of colors, the mixtures are made up of data and the neural network layers are the different colors being mixed.
As someone who knows barely enough to be dangerous, I like this. I'm sure it leaves enough out to make most experts angry, but it makes a lot of sense to me.
> they see 3 spreadsheets of numbers representing the RGB values of the picture.<p>This needs expanding: it's the sort of thing that's easy for a programmer to say, but few non-{programmer,mathematically trained person} are going to see that an RGB value has 3 parts and so a collection of RGB values could be sliced into 3 sheets.
This is the funniest refutation of the Chinese Room argument that I’ve seen. Note that at the end, it’s still the case that none of these people can recognize a cat.
The most concise and intuitive line of explanation I've been given goes along the lines of this:<p>1 - We want to model data, representative of some system, through functions.<p>2 - Virtually any function can be expressed by a n-th order polynomial.<p>3 - We wish to learn the parameters, the coefficients, of such polynomials.<p>4 - Neural networks allow us to brute-force test candidate values of such parameters (finding optimal candidate parameters such that error between expected and actual values of our dataset are minimized)<p>Whereas prior, methods (e.g. PCA) could only model linear relationships, neural networks allowed us to begin modeling non-linear ones.