科技回声

4 条评论

ehayes超过 2 年前

What is a good algorithm-to-purpose map for ML beginners? Looking for something like "Algo X is good for making predictions when your data looks like Y," etc.

评论 #33520409 未加载

评论 #33523500 未加载

评论 #33523847 未加载

alberth超过 2 年前

4 load balancers with local caches + 5 replica databases.I don’t want to discredit the achievement but the 1MM/sec seems less exciting when you learn they horizontally scaled the architecture.Especially when SQLite, in a client/server config (BedrockDB), achieved 4MM/sec from a single server.<a href="https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-qps-on-a-single-server/" rel="nofollow">https://blog.expensify.com/2018/01/08/scaling-sqlite-to-4m-q...</a>

评论 #33518821 未加载

评论 #33519906 未加载

评论 #33521274 未加载

评论 #33523967 未加载

评论 #33522994 未加载

tomrod超过 2 年前

I haven't heard of PostgresML before today. How does it compare to Feast?

评论 #33518743 未加载

tucnak超过 2 年前

We have used PostgresML to a great success albeit in a very limited setting.The whole philosophy of pushing Postgres to the absolute limit of what is possible is very attractive; think what Timescale[1] does for time series, and what PostgresML does for machine learning by eliminating the need to perform any kind of ETL jobs to cut down on the latencies as to how late the data is consumed after it's initially produced, et cetera. The very fact that we can now train and deploy regressions/classifiers using nothing but a couple views and timely SQL calls is already a fairly attractive proposition, however I'm really looking towards further adoption, library support, case studies and stuff like that.For example, I've been considering how Timescale could be used together with PostgresML to provide a single way to treat data from the point of consumption. In fact, I've reached out to Timescale Cloud on this very matter, and was sad to learn that they can't support PostgresML as part of their Cloud offering due to some security considerations related to it being a Python extension. Self-hosted it is. At any rate, they have some time back introduced what they call continuous aggregates[2]; a materialised view on top of a hypertable that doesn't require explicit refreshing, and is in fact realtime-accurate due to some waterline logic; it performs a normal view-like query for the most recent bits while retaining the materialised component in the compressed partial form. (Normal time-based compression and retention policies apply like they would to any other hypertable which is how the continuous aggregate views are implemented under the hood.) The idea here is that you can potentially reduce hundreds of million data points to a set of particular continuous aggregates within respective time frames; downsampling of past data is something you get for free. This is where I think the value lies for PostgresML; using a continuous aggregates for continuous training and as a store of historical predictions that you would normally want to keep separate from the training set itself. This could prove a reliable way of feeding new data for re-training along with downsampled selection of past data, and comparing various predictions on similar data points over time to keep track of how it goes. The data is compressed away while readily available and as long as it doesn’t have to change (historical predictions apply) the data points in the materialised component are never going to be materialised more than once.There are limitations[3] to this approach, of course, however they can be circumvented.If you're interested in this, you should check out some of the “hyperfunctions” they have to offer such as histogram[4] which is what I’ve had the pleasure of using previously to distil some of the time series data into a fixed-size feature vector, and naturally this would be a good place to start if you’re ever going to seriously consider this as something that you yourself would want to use for similar purpose. To me this is a natural step forwards in the data lifecycle/ supply chain approach. First, you get rid of ETLs as such by adopting PostgresML, and next you specify the “contract" of how your data is going to be produced, reduced, distilled, modelled, sampled, and ultimately evaluated— over prolonged periods of time and multiple iterations of the implementation.[1] <a href="https://docs.timescale.com/timescaledb/latest/overview/core-concepts/" rel="nofollow">https://docs.timescale.com/timescaledb/latest/overview/core-...</a>[2] <a href="https://docs.timescale.com/timescaledb/latest/how-to-guides/continuous-aggregates/about-continuous-aggregates/" rel="nofollow">https://docs.timescale.com/timescaledb/latest/how-to-guides/...</a>[3] <a href="https://docs.timescale.com/timescaledb/latest/overview/limitations/" rel="nofollow">https://docs.timescale.com/timescaledb/latest/overview/limit...</a>[4] <a href="https://docs.timescale.com/api/latest/hyperfunctions/histogram/" rel="nofollow">https://docs.timescale.com/api/latest/hyperfunctions/histogr...</a>

评论 #33529178 未加载

4 条评论

ehayes超过 2 年前

What is a good algorithm-to-purpose map for ML beginners? Looking for something like "Algo X is good for making predictions when your data looks like Y," etc.

Scaling PostgresML to 1M Requests per Second

4 条评论

Scaling PostgresML to 1M Requests per Second

4 条评论