Neural networks in the 1990s

131 点作者 jrott将近 2 年前

28 条评论

bvan将近 2 年前

Yikes, I’m old. There was a lot of NN work and a lot of books available on NN’s back in the mid and late 90’s. ‘Soft computing’ was the all-encompassing term for NN, genetic algorithms, AI, expert systems, fuzzy logic, ALife and all sorts of nascent computational areas back then. I still have a bunch of issues to the monthly AI Expert magazine one could buy at a decent magazine stand. Small data-sets were definitely a limiting factor as well as limited computer power. I remember certain applied fields did embrace NN’s early on, like some civil engineers and hydrologists, who were finding some use for them. At the U of Toronto, I considered doing a PhD with a biologist who was using them to investigate vision (and got help from Hinton). Physiology was one area where you could generate “long” time-series in a relatively short period of time. Those were still the days when Intel 286/386/486 and lowly Pentium machines were still common currency. Computer scientists at the time didn’t yet have clear break-through commercial applications which would have attracted crazy funding. A lot of theory, little real actions.

评论 #36387031 未加载

评论 #36390510 未加载

rm999将近 2 年前

While my experience is not from the 90s, I think I can speak to some of why this is. For some context, I first got into neural networks in the early 2000s during my undergrad research, and my first job (mid 2000s) was at an early pioneer that developed their V1 neural network models in the 90s (there is a good chance models I evolved from those V1 models influenced decisions that impacted you, however small).* First off, there was no major issue with computation. Adding more units or more layers isn't that much more expensive. Vanishing gradients and poor regulation were a challenge and meant that increasing network size rarely improved performance empirically. This was a well known challenge up until the mid/later 2000s.* There was a major 'AI winter' going on in the 90s after neural networks failed to live up to their hype in the 80s. Computer vision and NLP researchers - fields that have most famously recently been benefiting from huge neural networks - largely abandoned neural networks in the 90s. My undergrad PI at a computer vision lab told me in no uncertain terms he had no interest in neural networks, but was happy to support my interest in them. My grad school advisors had similar takes.* A lot of the problems that did benefit from neural networks in the 90s/early 2000s just needed a non-linear model, but did not need huge neural networks to do well. You can very roughly consider the first layer of a 2-layer neural network to be a series of classifiers, each tackling a different aspect of the problem (e.g. the first neuron of a spam model may activate if you have never received an email from the sender, the second if the sender is tagged as spam a lot, etc). These kinds of problems didn't need deep, large networks, and 10-50 neuron 2-layer networks were often more than enough to fully capture the complexity of the problem. Nowadays many practitioners would throw a GBM at problems like that and can get away with O(100) shallow trees, which isn't very different from what the small neural networks were doing back then.Combined, what this means from a rough perspective, is that the researchers who really could have used larger neural networks abandoned them, and almost everyone else was fine with the small networks that were readily available. The recent surge in AI is being fueled by smarter approaches and more computation, but arguably much more importantly from a ton more data that the internet made available. That last point is the real story IMO.

评论 #36395913 未加载

MilStdJunkie将近 2 年前

Data, data, data, data. 1990s don't have wikipedia, Youtube, megapixel cameras every which where, every single adult human hooked up to a sensor package 24 hours a day, and who knows what else. I know as a 1990s guy I would never have imagined the amount of data we would eventually all throw up into the ether even ten years later, to say nothing of today. Without that corpus . .

评论 #36386512 未加载

评论 #36386665 未加载

评论 #36387389 未加载

robg将近 2 年前

Highly recommend the exercises in Rumelhart and McClelland - Parallel Distributed Processing: Explorations in the Microstructure of Cognition from 1986-1987 (two volumes)<a href="https://direct.mit.edu/books/book/4424/Parallel-Distributed-ProcessingExplorations-in-the" rel="nofollow noreferrer">https://direct.mit.edu/books/book/4424/Parallel-Distributed-...</a>

评论 #36387293 未加载

评论 #36388352 未加载

radq将近 2 年前

We were missing two architecture patterns that were needed to get deeper nets to converge: residual nets [1] which solved gradient propagation, and batch normalization [2] which solved initialization.[1] Residual nets (2015): <a href="https://arxiv.org/abs/1512.03385" rel="nofollow noreferrer">https://arxiv.org/abs/1512.03385</a>[2] Batch normalization (2015): <a href="https://arxiv.org/abs/1502.03167" rel="nofollow noreferrer">https://arxiv.org/abs/1502.03167</a>

评论 #36389086 未加载

评论 #36388610 未加载

评论 #36390250 未加载

Solvency将近 2 年前

Do you think Carmack, deep down, wonders why he let himself miss the boat on the LLM revolution? He spent golden years toiling away in Facebook, only to finally announce he was quitting to focus on AGI... only for the world to be taken by storm by transformers, GPT, Midjourney, etc.If anyone could have been at the forefront of this wave, it could've been him.And now the landscape has utterly changed and no one is even convinced they need "AGI". Just a continually refined LLM hooked up to tools and other endpoints.

评论 #36387867 未加载

评论 #36388152 未加载

评论 #36386208 未加载

评论 #36387377 未加载

评论 #36386676 未加载

waivej将近 2 年前

I got exposed to programming neural networks in the early 90s. It solved certain problems incredibly fast like the traveling salesman problem. I was tinkering with 3D graphics and fractals and map pathfinding. Though it didn’t occur to me how much more power was there.“Data” was so much smaller then. I had a minuscule hard drive if any, no internet, 8 bit graphics but nothing photo realistic, glimpses of windows and os2, and barely a mouse. In retrospect, it was like embedded programming.

WiSaGaN将近 2 年前

I believe the issue was not a lack of computational power, but rather that people at the time didn't think large models with many parameters would effect meaningful change. This was even true three years ago, albeit on a different scale. As Ilya Sutskever expressed, people were not convinced there was still room to increase the scale. For the status quo to shift, two things could happen: a substantial reduction in computing costs, making large-scale experiments less a matter of conviction and more a matter of course; or the emergence of individuals with the resources and conviction to undertake larger experiments.

评论 #36386626 未加载

评论 #36386495 未加载

评论 #36386251 未加载

brucethemoose2将近 2 年前

1990s is beyond my time horizon.The oldest nn I was exposed to was a image upscaler (mostly used for deinterlacing) called nnedi, which goes back to ~2007: <a href="http://web.archive.org/web/20130127123511/http://forum.doom9.org/showthread.php?t=12995" rel="nofollow noreferrer">http://web.archive.org/web/20130127123511/http://forum.doom9...</a>nnedi3 is actually quite respectable today

version_five将近 2 年前

I think it's more that modern automatic differentiation abstractions weren't well known to researchers. From what I remember, even in the early 2000s when I went to school, backpropagation was basically hand coded.

评论 #36389797 未加载

评论 #36386265 未加载

brrrrrm将近 2 年前

I doubt it was obvious scaling up would magically work. I suspect the experiments were limited for analytic simplicity rather than computational.

评论 #36386276 未加载

评论 #36386014 未加载

评论 #36386066 未加载

yobbo将近 2 年前

To experiment with SGD and back-propagation with 4096x4096 32-bit matrices, you would need a machine with hundreds of megabytes of ram in the 90s. In terms of software, you would need to be comfortable with C/C++ or maybe Fortran to be able to experiment quickly enough to land on effective hyper parameters.Probably too many low-probability events chained together.But I think they discovered most of the interesting things that small networks can do? For example, TD-Gammon from 1992: <a href="https://en.wikipedia.org/wiki/TD-Gammon" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/TD-Gammon</a> .

hax0ron3将近 2 年前

The 1990s gamer in me gets a kick out of seeing John Carmack and Tim Sweeney talk to each other.

ttul将近 2 年前

In 1999, our “computer vision” guy - a masters student - struggled mightily to recognize very simple things in a video stream from a UAV. Today, we would take this for granted. But back then, the computation was for all intents and purposes entirely non-existent. At best he was hoping to apply an edge detection kernel maybe once every two seconds and see if he could identify some lines and arcs and then hand code some logic to recognize things.

评论 #36387250 未加载

rmnclmnt将近 2 年前

Yeah good times! The other day I was browsing for the 999th time Steve Smith's book "The Scientist and Engineer's Guide to Digital Signal Processing"[1] and stumbled upon the chapter on NN[2]: I remember ready this when I was a student I could make sense of it and why it worked, but reading it 15 years later I find it is explained so clearly compared to other resources! (maybe experience is playing in my favor too)You got a BASIC code snippet for training and inference and mos of all, there is an explicit use-case for digital filter approximation! At the time NN were treated as a tool among other ones, not a "answer-to-everything" type of thing.I know Deep Learning opened new possibilities but a lot of time CNN/RNN/Transformers are definitely not needed: working on the data instead and using "linear" models can go really far (my 2 cents)[1]: <a href="https://www.dspguide.com" rel="nofollow noreferrer">https://www.dspguide.com</a> [2]: <a href="http://www.dspguide.com/ch26.htm" rel="nofollow noreferrer">http://www.dspguide.com/ch26.htm</a>

29athrowaway将近 2 年前

In the early 90s, not only there was lower computing power but there was not that much internet connectivity, low bandwidth, no digital cameras so not that many images online, and the images the images you had were low res and low color depth. Internet giants didn't yet exist and didn't yet collect massive amounts of data.

plun9将近 2 年前

1995 car automatic transmission with neural network: <a href="https://www.sciencedirect.com/science/article/abs/pii/038943049500040E" rel="nofollow noreferrer">https://www.sciencedirect.com/science/article/abs/pii/038943...</a>

Ono-Sendai将近 2 年前

I personally made a quake 2 bot using neural networks in 1999, I think it had several hundred neurons and several thousand 'synapses' (parameters). At the time that felt like a lot of parameters. Computation wasn't much of a limit though, I could run several NNs faster than realtime.

2sk21将近 2 年前

I have one of the early PhDs in neural networks (graduated in 1992). However my work was analytical - I was able to prove a couple of theorems about the backpropagation. I just needed a simple implementation to prove that my ideas worked so I wrote my code from scratch in C.

bilsbie将近 2 年前

I remember people telling me you would just get overfitting if you made the network too big.I wonder how LLM’s avoid that?

amichal将近 2 年前

I followed a scientific American article in 1992 as a high schooler and got digit recognition and basic arithmetic working on a 386. What the popsci press said at the time was that we were limited by memory bandwidth (cache size), training data, and to some extend pointer-chasing (and other inefficencies) in graph algos

gattilorenz将近 2 年前

On the topic of AI history, I would like to set up a demo of old AI and/or general CS research on late 90s/early 00s Sun Ultra machines.Does anyone have suggestions (and links to code!) for what would be a cool demo? I’m thinking of a haar classifier to show some object recognition/face detection, but would appreciate more options!

mistrial9将近 2 年前

definitely saw NN code in the 1990s ; I recall a hardback book with mostly red cover.. not sure of the title.. Prominent and rigorous code implementations were associated with MIT at that time (the Random Forest guy was at Berkeley in the stats department)edit yes, almost certainly Neural Networks for Pattern Recognition (1995) thx!

评论 #36386358 未加载

mjan22640将近 2 年前

In 2012 were published results of a vision processing in the brain research, that (among other things, like the retina compressing the input) figured out that visual cortex uses convolution. That got mimicked and was a breakthrough in image recognition NN, which sparked life into the whole field.

评论 #36401333 未加载

anthk将近 2 年前

<a href="https://tldp.org/HOWTO/AI-Alife-HOWTO-1.html" rel="nofollow noreferrer">https://tldp.org/HOWTO/AI-Alife-HOWTO-1.html</a>This was a thing in early 2000s/late 90's.

LarsDu88将近 2 年前

Lol, Carmack's like -- I could've gotten a 4096 NN running on my early 90s NextCube dev rig, you neural networking researcher peasants!

rwmj将近 2 年前

I knew someone in the early 90s who was making a neural network on a chip for his PhD. The chip fitted 1 neuron. Yes he might have used float16 to cram more in but those techniques were not known at the time.There really wasn't the compute power around at the time, and as others have pointed out there wasn't the training data, or the cameras.

FrustratedMonky将近 2 年前

Reading through the twitter thread, and these comments. It reminds me of all of the back and forth when HN discusses Psychology.One side, holding a pipe, 'well actually, back in 1954, I put together an analog variant of a neuron perceptron built out of old speaker cables and car parts, strung it across the living room and it could say 10 words and fetch my slippers'. 'Really', 'Yes, Indubitably'.The other side, It's all, 'REEEEEEEEEE'

评论 #36389950 未加载

评论 #36390454 未加载