Show HN: PostgresML, now with analytics and project management

444 pointsby levkkabout 3 years ago

We've been hard at work for a few weeks and thought it's time for another update.In case you missed our first post, PostgresML is an end-to-end machine learning solution, running alongside your favorite database.This time we have more of a suite offering: project management, visibility into the datasets and the deployment pipeline decision making.Let us know what you think!Demo link is on the page, and also here: <a href="https://demo.postgresml.org" rel="nofollow">https://demo.postgresml.org</a>

29 comments

zmmmmmabout 3 years ago

Seems like a great idea. When you look at many ML frameworks half the code and learning overhead is data schlepping code and table like structures that "reinvent" the schema that already exists inside a database. Not to mention, there can be security concerns from dumping large amounts of data out of the primary store (how are you going to GDPR delete that stuff later on?). So why not use it natively where the data already is?For anything substantive it seems like a bad idea to run this on your primary store since the last thing you want to do is eat up precious CPU and RAM needed by your OLTP database. But in a data warehouse or similar replicated setup, it seems like a really neat idea.

评论 #31242000 未加载

评论 #31244077 未加载

jdolinerabout 3 years ago

This is really cool, running ML workloads on top of SQL is a very practical way of doing ML for a lot of businesses. Many companies don't have the fancy ML workloads like you see at OpenAI, they just have a SQL database with some data that could greatly help their business with some simple ML models trained on it. This looks like a nice way to do it. A slightly different approach that I've been working on involves hooking data warehouses up to Pachyderm [0] so you can do offline training on it. Not as good for online stuff as this, but for longer running batch style jobs it works really well.[0] <a href="http://github.com/pachyderm/pachyderm" rel="nofollow">http://github.com/pachyderm/pachyderm</a>

phenkdoabout 3 years ago

Can this be used to deploy an "active learning" model that learns from fresh data and model auto-updates?

评论 #31238963 未加载

LunaSeaabout 3 years ago

Do you plan on adding support for managed PostgreSQL services like RDS in the future?

评论 #31239795 未加载

评论 #31272734 未加载

评论 #31243580 未加载

jorgemfabout 3 years ago

How do you deal with different dataset train/validation/test? How do you measure the degradation of the model? Is there any way to select the metric you target (accuracy, f1-score or any other)?

评论 #31239956 未加载

bazhovaabout 3 years ago

This is great! FYI for those who haven't seen, BigQuery can also run statistical learning methods directly on your data as part of the query. Really cool to see ML going this direction.

waatelsabout 3 years ago

Hello really nice !Can you explane the differences with <a href="https://madlib.apache.org/" rel="nofollow">https://madlib.apache.org/</a> ? Wouldnt an OLAP db better suited than pg for this kind of workload ?Does being a postgreSQL module make it compatible with citus, greemplum or timescale ?

评论 #31240529 未加载

bguberfainabout 3 years ago

Can we offload model train to a different server? It can be parallelized? Anyway, nice API and a promising project.

评论 #31240742 未加载

Abishek_Muthianabout 3 years ago

Congratulations on the launch!This is the most exciting ML related project I've seen in a while, Mainly because the barrier for entry seems low as anyone with PG database could apply a model on them using PostgresML if I understood the premise correctly.Most of the comments here seems to regarding separating the compute from the database machine which it seems isn't possible right now with PostgresML, But the GitHub reads at the start:> The system runs Postgres with the pgml-extension installed on port 5433 by default, *just in case you happen to be running Postgres already*:<pre><code> $ psql -U postgres -h 127.0.0.1 -p 5433 -d pgml_development </code></pre> I think the second part needs to be clarified better, Is it installing PGML extension on a machine running a existing PG database and connecting to it (or) does it mean just starting the postgres session of the PGML docker package?

ekzhuabout 3 years ago

Great idea! I see this is implemented using the Python language interface supported by PostgreSQL and importing sklearn models. I always wonder how scalable this is considering the serialization-deserialization overhead between Postgres' core and Python. Do you see any significant performance difference between this and training the sklearn models directly on something like Dataframes?

评论 #31242335 未加载

simonwabout 3 years ago

This looks amazing!The animated GIF on your homepage moves a little bit too fast for me to follow.

mancaabout 3 years ago

Interesting concept, but I think Big Query ML [1] has been providing similar features for years now. Curious to learn what are the differences, other than offering this as a Postgres plugin.[1] <a href="https://cloud.google.com/bigquery-ml/docs/introduction" rel="nofollow">https://cloud.google.com/bigquery-ml/docs/introduction</a>

sagaroabout 3 years ago

I don't understand 5he example on the homepage. How does the extension know what is "buy it again"?

评论 #31239134 未加载

评论 #31239096 未加载

chartpathabout 3 years ago

Reminds me of <a href="https://riverml.xyz/latest/" rel="nofollow">https://riverml.xyz/latest/</a> (which is awesome) but the idea is even better because it skips all the copying and preprocessing yak shaving. Can't wait to kick the tires!

thejansenabout 3 years ago

Cool approach. This nicely fits in the trend of SQL-as-much-as-possible because that makes it just a tiny bit more accessible. Definitely going to play with this in the next few days. (edit:) Being able to get training data from a SQL view is by far the nicest. Keep it up!

lysecretabout 3 years ago

I feel like a lot of issues out of ML systems came from the fact that some person got a CSV dump of the data and then iterated for a month to build a fantastic model, which nobody knows how to integrate with the DB.So, this is why I really like this idea and about 3 years ago I seriously thought about starting this thing as well. I went ahead and built a specific data company (so not a tooling one) and now I don't like this idea anymore.To me this is a lot like proposing: "lets get rid of Rest Apis and Graphql and connect the frontend directly to the DB". (ignoring security issues for a bit).In frontend: The view you like to display your data is a different one than how it should be saved. Exactly the same in ML, the view your data can be trained / predicted on is a very different than it should be stored.They are connected, but IMO there always has to be a transformation layer. (and Python is just a much better way to do that transformation, but that's an other story)

kipukunabout 3 years ago

Neat project. Any roadmap for cross-validation support (GridSearchCV and friends)?

评论 #31240684 未加载

评论 #31268149 未加载

tomerbdabout 3 years ago

Sql to rule the world! Now just Sql to create GUI and websites and I'm

gabereiserabout 3 years ago

This is awesome. I’m guessing the models are executed on the database server and not a separate cluster? What about GPU training? How is that handled? I’d love to see more docs.

teknopurgeabout 3 years ago

This is great - will be experimenting with one weekend soon...

xpeabout 3 years ago

I wonder if/how a PostgreSQL plug-in can provide an optimal mix of computing and storage resources for varying machine learning workloads.

评论 #31243631 未加载

评论 #31247334 未加载

dayeye2006about 3 years ago

How this compares to <a href="https://mindsdb.com/" rel="nofollow">https://mindsdb.com/</a>

wodenokotoabout 3 years ago

Is this a competitor to bigquery autoML or to something like kubeflow?

ekzyabout 3 years ago

This looks awesome! I’m not an expert but wouldn’t the typical database hardware not be really optimal for running ML? Is this meant to run on a replica (which is quite straightforward to setup) that has ML optimised hardware?

评论 #31240130 未加载

debarshriabout 3 years ago

Very cool! Will probably use it soon.

评论 #31239726 未加载

obertabout 3 years ago

is it possible, or how hard is it, to plug in custom proprietary models?

评论 #31242237 未加载

toddmabout 3 years ago

Do I need to know about ML/statistics to interpret the results?

matthewtovbinabout 3 years ago

WHY!?

jmuguyabout 3 years ago

What affiliation does this have with PostgreSQL?

评论 #31238825 未加载

评论 #31238864 未加载