Who is using Python UDFs in SQL?<p>Lately I've seen many of the data-warehouses/lakes providers rolling out features that let's you run Python UDFs as part of your SQL command. And, runs it close to the data.
I'm a fan, and documented my journey in a series of posts including:<p><a href="https://blog.jonudell.net/2021/07/24/pl-pgsql-versus-pl-python-heres-why-im-using-both-to-write-postgres-functions/" rel="nofollow">https://blog.jonudell.net/2021/07/24/pl-pgsql-versus-pl-pyth...</a><p><a href="https://blog.jonudell.net/2021/08/13/pl-python-metaprogramming/" rel="nofollow">https://blog.jonudell.net/2021/08/13/pl-python-metaprogrammi...</a><p><a href="https://blog.jonudell.net/2021/08/21/postgres-functional-style/" rel="nofollow">https://blog.jonudell.net/2021/08/21/postgres-functional-sty...</a>
It may be "close[r] to the data", but be ware Python UDFs usually still have context switches between the python interpreter and the query executor. If these pile up — as is the case if you fire off queries in, say, a loop — you will still see lousy performance. There is currently research under way, to optimize UDFs and other forms of computations over database resident data. For Python, for example, you can checkout the python to SQL compiler [0] and associated demo-paper [1] we produced for this years Sigmod. (Disclaimer: as you may have noticed thanks to that " we", I'm an author on that paper.) Though it may only support a limited subset of Python, one can already do alot with just that.<p>[0]: <a href="https://apfel-db.informatik.uni-tuebingen.de" rel="nofollow">https://apfel-db.informatik.uni-tuebingen.de</a>
[1]: <a href="https://db.inf.uni-tuebingen.de/publications/2022/Hirn-grust-fischer" rel="nofollow">https://db.inf.uni-tuebingen.de/publications/2022/Hirn-grust...</a>