What's Going on in Machine Learning? Some Minimal Models

239 pointsby taywrobel9 months ago

15 comments

deng9 months ago

Say what you will about Wolfram: he's a brilliant writer and teacher. The way he's able to simplify complex topics without dumbing them down is remarkable. His visualizations are not only extremely helpful but usually also beautiful, and if you happen to have Mathematica on hand, you can easily reproduce what he's doing. Anytime someone asks me for a quick introduction to LLMs, I always point them to this article of his, which I still think is one of best and most understandable introductions to the topic:<a href="https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-doing-and-why-does-it-work/" rel="nofollow">https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...</a>

评论 #41333315 未加载

vessenes9 months ago

Classic Wolfram — brilliant, reimplements / comes at a current topic using only cellular automata, and draws some fairly deep philosophical conclusions that are pretty intriguing.The part I find most interesting is his proposal that neural networks largely work by “hitching a ride” on fundamental computational complexity, in practice sort of searching around the space of functions representable by an architecture for something that works. And, to the extent this is true, that puts explainability at fundamental odds with the highest value / most dense / best deep learning outputs — if they are easily “explainable” by inspection, then they are likely not using all of the complexity available to them.I think this is a pretty profound idea, and it sounds right to me — it seems like a rich theoretical area for next-gen information theory, essentially are their (soft/hard) bounds on certain kinds of explainability/inspectability?FWIW, there’s a reasonably long history of mathematicians constructing their own ontologies and concepts and then people taking like 50 or 100 years to unpack and understand them and figure out what they add. I think of Wolfram’s cellular automata like this, possibly really profound, time will tell, and unusual in that he has the wealth and platform and interest in boosting the idea while he’s alive.

评论 #41328807 未加载

评论 #41330283 未加载

评论 #41329966 未加载

评论 #41329519 未加载

评论 #41330111 未加载

评论 #41330123 未加载

nuz9 months ago

I can never read comments on any wolfram blog on HN because they're always so mean spirited. I'm seeing a nerdy guy explaining things from a cool new perspective I'm excited to read through. The comments almost always have some lens against him being 'self centered' or obsessing about cellular automata (who cares we all have our obsessions)

评论 #41328919 未加载

评论 #41329219 未加载

ralusek9 months ago

There should be a Godwin’s Law for Stephen Wolfram. Wolfram’s Law: as the length of what he’s saying increases, the probability it will be about cellular automata approaches 1.That being said, I’m enjoying this. I often experiment with neural networks in a similar fashion and like to see people’s work like this.

评论 #41326016 未加载

评论 #41325730 未加载

krackers9 months ago

>Instead what seems to be happening is that machine learning is in a sense just “hitching a ride” on the general richness of the computational universe. It’s not “specifically building up behavior one needs”; rather what it’s doing is to harness behavior that’s “already out there” in the computational universe.Is this similar to the lottery ticket hypothesis?Also the visualizations are beautiful and a nice way to demonstrate the "universal approximation theorem"

DataDive9 months ago

I find it depressing that every time Stephen Wolfram wants to explain something, he slowly gravitates towards these simplistic cellular automata and tries to explain everything through them.It feels like a religious talk.The presentation consists of chunks of hard-to-digest, profound-sounding text followed by a supposedly informative picture with lots of blobs, then the whole pattern is repeated over and over.But it never gets to the point. There is never an outcome, never a summary. It is always some sort of patterns and blobs that are supposedly explaining everything ... except nothing useful is ever communicated. You are supposed to "see" how the blobs are "everything..." a new kind of Science.He cannot predict anything; he can not forecast anything; all he does is use Mathematica to generate multiplots of symmetric little blobs and then suggests that those blobs somehow explain something that currently existsI find these Wolfram blogs a massive waste of time.They are boring to the extreme.

评论 #41332290 未加载

评论 #41328869 未加载

评论 #41329013 未加载

wrsh079 months ago

Because of the computational simplicity, I think there's a possibility that we will discover very cheap machine learning techniques that are discrete like this.I think this is novel (I've seen BNN <a href="https://arxiv.org/pdf/1601.06071" rel="nofollow">https://arxiv.org/pdf/1601.06071</a> This actually makes things continuous for training, but if inference is sufficiently fast and you have an effective mechanism for permutation, training could be faster using that)I am curious what other folks (especially researchers) think. The takes on Wolfram are not always uniformly positive but this is interesting (I think!)

评论 #41329833 未加载

usgroup9 months ago

Tsetlin machines have been around for some time:<a href="https://en.wikipedia.org/wiki/Tsetlin_machine" rel="nofollow">https://en.wikipedia.org/wiki/Tsetlin_machine</a>They are discrete, individually interpretable, and can be configured into complicated architectures.

评论 #41331243 未加载

评论 #41329261 未加载

评论 #41339572 未加载

achrono9 months ago

>All one will be able to say is that somewhere out there in the computational universe there’s some (typically computationally irreducible) process that “happens” to be aligned with what we want.>There’s no overarching theory to it in itself; it’s just a reflection of the resources that were out there. Or, in the case of machine learning, one can expect that what one sees will be to a large extent a reflection of the raw characteristics of computational irreducibilityStrikes me as a very reductive and defeatist take that flies in the face of the grand agenda Wolfram sets forth.It would have been much more productive to chisel away at it to figure out something rather than expecting the Theory to be unveiled in full at once.For instance, what I learn from the kinds of playing around that Wolfram does in the article is: neural nets are but one way to achieve learning & intellectual performance, and even within that there are a myriad different ways to do it, but most importantly: there is a breadth vs depth trade-off, in that neural nets being very broad/versatile are not quite the best at going deep/specialised; you need a different solution for that (e.g. even good old instruction set architecture might be the right thing in many cases). This is essentially why ChatGPT ended up needing Python tooling to reliably calculate 2+2.

评论 #41328526 未加载

dbrueck9 months ago

I believe that this is one of the key takeaways for reasoning about LLMs and other seemingly-magical recent developments in AI:"tasks—like writing essays—that we humans could do, but we didn’t think computers could do, are actually in some sense computationally easier than we thought."It hurts one's pride to realize that the specialized thing they do isn't quite as special as was previously thought.

评论 #41329522 未加载

评论 #41328967 未加载

delifue9 months ago

> But now we get to use a key feature of infinitesimal changes: that they can always be thought of as just “adding linearly” (essentially because ε2 can always be ignored to ε). Or, in other words, we can summarize any infinitesimal change just by giving its “direction” in weight space> a standard result from calculus gives us a vastly more efficient procedure that in effect “maximally reuses” parts of the computation that have already been done.This partially explains why gradient descent becomes mainstream.

aeonik9 months ago

This article does a good job laying the foundation of why I think homiconic languages are so important, and doing AI in languages that aren't, are doomed to stagnation in the long term.The acrobatics that Wolfram can do with the code and his analysis is awesome, and doing the same without the homoiconicity and metaprogramming makes my poor brain shudder.Do note, Wolfram Language is homoiconic, and I think I remember reading that it supports Fexprs. It has some really neat properties, and it's a real shame that it's not Open Source and more widely used.

评论 #41330716 未加载

jderick9 months ago

It is interesting to see the type of analysis he does and the visualizations are impressive, but the conclusions don't really seem too surprising. To me, it seems the most efficient learning algorithm will not be simpler but rather much more complex, likely some kind of hybrid involving a multitude of approaches. An analogy here would be looking at modern microprocessors -- although they have evolved from some relatively simple machines, they involve many layers of optimizations for executing various types of programs.

jksk619 months ago

Is a TL;DR available or at least some of the ideas covered? Because after 3 paragraphs it seems the good old "it is actually something resembling a cellular automata" post by Wolfram.

评论 #41336110 未加载

jmount9 months ago

Wow- Wolfram "invented" cellular automata, neural nets, symbolic algebra, physics and so much more.