Stochastic gradient descent written in SQL

295 pointsby Lemaxoxoabout 2 years ago

30 comments

jackschultzabout 2 years ago

>A machine learning algorithm which can be trained using SQL opens a world of possibilities. The model and the data live in the same space. This is as simple as it gets in terms of architecture. Basically, you only need a database which runs SQL.First paragraph of the conclusion, and this very much fits with the mindset that's been growing in me in the data world over the past few years. Databases are much more powerful than we think, they're not going to go away, only get better, and having the data and the logic in the same space really removes tons of headaches. ML models, transformation of data, generating json for an API can all be done within the database rather than outside scripting language.Are others seeing this? Are the current tradeoffs just that more people know python vs sql or database specific languages to where moving logic to postgres or snowflake is looked down on?

评论 #35056012 未加载

评论 #35061979 未加载

评论 #35059017 未加载

评论 #35055616 未加载

评论 #35056100 未加载

评论 #35061767 未加载

评论 #35059067 未加载

评论 #35056234 未加载

评论 #35056309 未加载

评论 #35059884 未加载

评论 #35055527 未加载

评论 #35057929 未加载

评论 #35058546 未加载

评论 #35055796 未加载

评论 #35055889 未加载

评论 #35060104 未加载

评论 #35055598 未加载

评论 #35058139 未加载

评论 #35059203 未加载

评论 #35064786 未加载

评论 #35064571 未加载

评论 #35056078 未加载

评论 #35060727 未加载

评论 #35061665 未加载

评论 #35055520 未加载

Terrettaabout 2 years ago

Title here is wrong. Title in article and headings in article are right: ONLINE gradient descentIt's specifically not stochastic. From the article:Online gradient descentFinally, we have enough experience to implement online gradient descent. To keep things simple, we will use a very vanilla version:- Constant learning rate, as opposed to a schedule.- Single epoch, we only do one pass on the data.- Not stochastic: the rows are not shuffled. ⇠ ⇠ ⇠ ⇠- Squared loss, which is the standard loss for regression.- No gradient clipping.- No weight regularisation.- No intercept term.

评论 #35060318 未加载

评论 #35059408 未加载

syatsabout 2 years ago

Just don't.SQL:- does not allow for easy and clean importing of modules/libraries- is not easily to write tests for- has limited support for a debugger- lacks a consistent style for such large queries (plus most textbook cover fairly simple stuff) which means it's hard for a developer to start reading someone else's code (more than in other languages)- clearly indicates in its name that it is a Query language.Save yourself the trouble and all your collaborators the pain of working with this code in the future, of trying to add new features, of trying to reuse it in another project.If you want to operate near the data, use PL/Python for PostgreSQL.EDIT: Fixed formatting.

评论 #35057315 未加载

评论 #35067828 未加载

zackmorrisabout 2 years ago

This is great! Moving away from the proprietary nature of GPUs and complex math gatekeeping should help democratize AI.Has anyone converted stuff like gradient descent to set theory?<a href="https://support.unicomsi.com/manuals/soliddb/7/SQL_Guide/2_GettingStartedSQL.03.04.html" rel="nofollow">https://support.unicomsi.com/manuals/soliddb/7/SQL_Guide/2_G...</a><a href="https://www.sqlshack.com/mathematics-sql-server-fast-introduction-set-theory/" rel="nofollow">https://www.sqlshack.com/mathematics-sql-server-fast-introdu...</a><a href="https://www.sqlshack.com/learn-sql-set-theory/" rel="nofollow">https://www.sqlshack.com/learn-sql-set-theory/</a>Right now AI algorithms kind of look imperative and stateful to me, like state machines. But there should be a functional or relational representation, where pattern matching would be used to derive the current state instead.It's trivial to go from functional to imperative representation, but often nearly impossible to go the other way. That's why monadic (sorry if I'm mincing terms) logic is so troublesome. Mutable variables, futures/promises, async, etc can't be statically analyzed, which is why most code today is difficult to parallelize and stuff like C++ optimizers don't even increase speed to the next order of magnitude. But functional languages have nearly infinite optimization potential through divide and conquer approaches like sharding and scatter-gather arrays that can run internally without side effects. In other words, we can throw hardware at SQL for linear speedup since it's embarrassingly parallel, but might have limited success optimizing something like Python.

评论 #35062778 未加载

评论 #35062694 未加载

评论 #35063872 未加载

评论 #35061928 未加载

andrenotgiantabout 2 years ago

This is really interesting, but a basic part I don't understand: What would it actually look like to run this on a live dataset?If I understand correctly: you'd run the recursive query, it produces results for every step, effectively showing you the progression of output over time, and then once it hits "present day", it completes and stops?How would you generate results going forward? I.E. A minute elapses after the results return, do you have to re-run the whole query for all time?

评论 #35055021 未加载

评论 #35055676 未加载

krickabout 2 years ago

From the start I assumed this is a nice playful "Hexing the technical interview" kinda joke. But given the tone of the article, and some of the comments here… Uh, this cannot be serious, right?

评论 #35059555 未加载

评论 #35060146 未加载

glomgrilabout 2 years ago

Incredible post. I laughed when I saw the title, snickered at the first paragraph, and then proceeded to be blown away by the rest of it. Thought I was in for a joke and instead I'm thinking about the nature of ML Ops and what it's become.

评论 #35063078 未加载

civilizedabout 2 years ago

In the comments here so far, we see a pattern we've seen before. When someone suggests doing something in SQL, there's a lot of concern about SQL being a very limited programming language where it's hard to do proper engineering.Here's I would really love to know: why is it that SQL is, to first order, the only language used to interact with databases, and SQL has about the same features as it did in the 70s? It seems analogous to if the general-purpose programming world stopped with C.

评论 #35060921 未加载

评论 #35058941 未加载

评论 #35059382 未加载

AndrewKemendoabout 2 years ago

This is genius I love it.On stream inference with something like "continuous" data in the same structure I think is the final material form for "AI" so this is a great step towards thatThanks for the writeup

pradeepchhetriabout 2 years ago

You might want to consider checking out ClickHouse which supports many ML functions natively:- stochasticLinearRegression (<a href="https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/stochasticlinearregression/" rel="nofollow">https://clickhouse.com/docs/en/sql-reference/aggregate-funct...</a>)- stochasticLogisticRegression (<a href="https://clickhouse.com/docs/en/sql-reference/aggregate-functions/reference/stochasticlogisticregression/#agg_functions-stochasticlogisticregression" rel="nofollow">https://clickhouse.com/docs/en/sql-reference/aggregate-funct...</a>)

评论 #35055639 未加载

评论 #35076262 未加载

yttribiumabout 2 years ago

At least one F500 company had iteratively reweighted least squares implemented in MS SQL as of ~20y ago. When all you have is a hammer...

airstrikeabout 2 years ago

This is great. The only thing I dislike from this is using these variables to try to predict Adj Close when they are not at all correlated.There are countless meaningful correlations in financial data that would have been just as easy to play around with. One truly valuable example would be to look at trading multiples of comparable companies. Sticking to P/E would be easier as P is easily observable and forward-looking EPS estimates are generally always available. This would limit the exercise to more mature companies than the ones commonly discussed on HN but would make it actually meaningful

评论 #35060554 未加载

agnosticmantisabout 2 years ago

Just to disambiguate, here machine learning = linear regression. So no auto-grad needed as gradients are explicit and simple. Still interesting though.

episode0x01about 2 years ago

From a comment by the author:> Also, I can't justify why, but my gut feeling is that the database should be an order of magnitude faster than Python, provided the implementation were to be improved.Would be curious how that could end up being the case. Perhaps if NumPy wasn't used at all? That would mean no vectorization or optimized storage.Would be interesting to see how it scaled with length and dimensionality

victorNicolletabout 2 years ago

We've been doing this type of gradient descent in production for a couple of years now, and we've been very happy with it. A few insights from my experience:One of the main benefits of doing differentiable programming over a relational database is that you can use joins to traverse the normalized data schema, and this acts as a very effective prior for your model. For example, you can learn parameters associated to shirt colors by having a parameter in the color table, and joining it with the shirt sales table (through the shirts table). And knowledge of the table structure also lets you improve convergence, for example by adding a factor to compensate for the fact that not all colors have the same number of sales (and therefore have been updated fewer times during each epoch). Here's a paper from last year: <a href="https://arxiv.org/abs/2209.03771v1" rel="nofollow">https://arxiv.org/abs/2209.03771v1</a>Automatic differentiation is an absolute must-have. Once you start having a few complex joins and aggregations, it's too hard to derive the gradient by hand. Doing automatic differentiation on relational queries requires some adjustments (in particular, because you cannot use a tape), so you end up having to define a subset of relational algebra that is closed by automatic differentiation. The general ideas were presented in <a href="https://ceur-ws.org/Vol-2971/paper07.pdf" rel="nofollow">https://ceur-ws.org/Vol-2971/paper07.pdf</a>On the other hand, while it's possible to do SQL-to-SQL automatic differentiation, the resulting queries have poor performance, so it's better to go one step lower (at the level of the executor for an already-planned query) and perform the automatic differentiation there. And we had an excellent intern work with us on dedicated parallelization for gradient descent queries: <a href="https://blog.lokad.com/pdf/reproducible-parallel-sgd-ziyad-benomar-2022-09.pdf" rel="nofollow">https://blog.lokad.com/pdf/reproducible-parallel-sgd-ziyad-b...</a>Finally, in practice, it usually doesn't matter if the gradient descent is stochastic, so long as you run several passes over the data.I'm very interested in differentiable programming on relational languages, as the mainstream research concentrates on unstructured, low-information-density data (images/sound/text) when there are many domains where the data is more dense and structured, and (because of that structure) is stored in a database.

college_physicsabout 2 years ago

Is there a risk that postgres becomes sentient in our lifetime?

gigatexalabout 2 years ago

I thought I was decent at SQL until I saw this. Nice article!

saydaarkabout 2 years ago

This is interesting, but is not using the GPU.

freilanzerabout 2 years ago

Or use MindsDB: <a href="https://mindsdb.com/">https://mindsdb.com/</a>

Narewabout 2 years ago

There was also this neural network module in redis if you want to do training and inference in a redid DB. (quite old btw) <a href="https://github.com/antirez/neural-redis">https://github.com/antirez/neural-redis</a>

Rickasaurusabout 2 years ago

Please don't do this in prod

评论 #35057343 未加载

Foobar8568about 2 years ago

R has been natively supported in sql server for a long while (At least since 2017), same for Java or python, e.g. Writing SP in these languages.

simonwabout 2 years ago

I tried to replicate this in SQLite. The first few steps worked OK, e.g.<a href="https://lite.datasette.io/?json=https://gist.github.com/simonw/f750e08b33851a5ac8786d0b717a0152#/data?sql=WITH+RECURSIVE%0A++++stream+AS+%28%0A++++++++SELECT+ROW_NUMBER%28%29+OVER+%28%29+AS+step%2C+%22Adj+Close%22+AS+x%0A++++++++FROM+raw%0A++++++++ORDER+BY+step%0A++++%29%2C%0A++++state%28step%2C+x%2C+avg%29+AS+%28%0A++++++++--+Initialize%0A++++++++SELECT+step%2C+x%2C+x+AS+avg%0A++++++++FROM+stream%0A++++++++WHERE+step+%3D+1%0A++++++++UNION+ALL%0A++++++++--+Update%0A++++++++SELECT%0A++++++++++++stream.step%2C%0A++++++++++++stream.x%2C%0A++++++++++++state.avg+%2B+%28stream.x+-+state.avg%29+%2F+stream.step+AS+avg%0A++++++++FROM+stream%0A++++++++INNER+JOIN+state+ON+state.step+%2B+1+%3D+stream.step%0A++++%29%0A%0ASELECT+*%0AFROM+state%0AORDER+BY+step+DESC%0ALIMIT+5" rel="nofollow">https://lite.datasette.io/?json=https://gist.github.com/simo...</a>(I replaced "figures" with "raw" due to the way Datasette Lite assigns a default table name to the imported JSON)But the more complex recursive queries gave me this error and I'm not sure how to work around it:<pre><code> recursive reference in a subquery: state </code></pre> E.g. <a href="https://lite.datasette.io/?json=https://gist.github.com/simonw/f750e08b33851a5ac8786d0b717a0152#/data?sql=WITH+RECURSIVE%0A++++stream+AS+%28%0A++++++++SELECT%0A++++++++++++ROW_NUMBER%28%29+OVER+%28%29+AS+step%2C%0A++++++++++++%22Adj+Close%22+AS+x%2C%0A++++++++++++%22Close%22+AS+y%0A++++++++FROM+raw%0A++++%29%2C%0A++++state%28step%2C+x%2C+x_avg%2C+y%2C+y_avg%2C+cov%29+AS+%28%0A++++++++--+Initialize%0A++++++++SELECT%0A++++++++++++step%2C%0A++++++++++++x%2C%0A++++++++++++x+AS+x_avg%2C%0A++++++++++++y%2C%0A++++++++++++y+AS+y_avg%2C%0A++++++++++++0.0+AS+cov%0A++++++++FROM+stream%0A++++++++WHERE+step+%3D+1%0A++++++++UNION+ALL%0A++++++++--+Update%0A++++++++SELECT%0A++++++++++++step%2C%0A++++++++++++x%2C%0A++++++++++++x_new_avg+AS+x_avg%2C%0A++++++++++++y%2C%0A++++++++++++y_new_avg+AS+y_avg%2C%0A++++++++++++cov+%2B+%28%28x+-+x_prev_avg%29+*+%28y+-+y_new_avg%29+-+cov%29+%2F+step+AS+cov%0A++++++++FROM+%28%0A++++++++++++SELECT%0A++++++++++++++++stream.step%2C%0A++++++++++++++++stream.x%2C%0A++++++++++++++++stream.y%2C%0A++++++++++++++++state.x_avg+AS+x_prev_avg%2C%0A++++++++++++++++state.x_avg+%2B+%28stream.x+-+state.x_avg%29+%2F+stream.step+AS+x_new_avg%2C%0A++++++++++++++++state.y_avg+AS+y_prev_avg%2C%0A++++++++++++++++state.y_avg+%2B+%28stream.y+-+state.y_avg%29+%2F+stream.step+AS+y_new_avg%2C%0A++++++++++++++++state.cov%0A++++++++++++FROM+stream%0A++++++++++++INNER+JOIN+state+ON+state.step+%2B+1+%3D+stream.step%0A++++++++%29%0A++++%29%0A%0ASELECT+step%2C+cov%0AFROM+state%0AORDER+BY+step+DESC%0ALIMIT+5&DOUBLE=" rel="nofollow">https://lite.datasette.io/?json=https://gist.github.com/simo...</a>

评论 #35060106 未加载

hubjaabout 2 years ago

Super intresting! Love the scrappy mindset ;)

评论 #35055223 未加载

PaulHouleabout 2 years ago

Reminds me of the time I coded up an optimized nearest neighbor search in SQL.

yzaabout 2 years ago

Next step would be to implement autodiff. If only PostgreSQL had CUDA support.

rlewkovabout 2 years ago

Uhhhh, ya know that Python has a library for that :-)

kunalguptaabout 2 years ago

god tier

kilgnadabout 2 years ago

An even MORE useful idea is to do this in CSS.

snookerdookerabout 2 years ago

I challenge you to add more emojis to your bio ;)

评论 #35055328 未加载