TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Functions matter – an alternative to SQL and map-reduce data processing

6 点作者 asavinov大约 4 年前

2 条评论

asavinov大约 4 年前
The main motivation is that the conventional approaches to data processing are based on manipulating mathematical <i>sets</i> for all kinds of use cases: we produce a new set if we want to calculate a new attribute, we produce a new set if want to match data from different tables, we get a new set if we aggregate data. Yet, we actually do not need to produce new sets (table, collections etc.) in many cases - it is enough to add a new column to an existing set. Here are more details about the motivation:<p><a href="https:&#x2F;&#x2F;prosto.readthedocs.io&#x2F;en&#x2F;latest&#x2F;text&#x2F;why.html" rel="nofollow">https:&#x2F;&#x2F;prosto.readthedocs.io&#x2F;en&#x2F;latest&#x2F;text&#x2F;why.html</a><p>Column is an implementation of a <i>function</i> (similarly to how table is an implementations of a set). Theoretically, this approach leads to a data model based on two core elements: mathematical <i>functions</i> (new) and mathematical <i>sets</i> (old).<p>This approach was implemented in Prosto which is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby.
asavinov大约 4 年前
Here is another project based on the same idea of processing data using <i>functions</i>:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;asavinov&#x2F;lambdo" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;asavinov&#x2F;lambdo</a> - Feature engineering and machine learning: together at last!<p>Yet, here the focus is on feature engineering and rethinking how it can be combined with traditional ML. Essentially, the point is that there no big differences and it is more natural and simpler to think of them as special cases of the same concept: features can be learned and ML models are frequently are used for producing intermediate results.