TechEcho

4 comments

Interesting, thanks for sharing.How do you handle historical backfill for new features? As in, some feature that can be updated in streaming fashion but whose initial value depends on data from the last X years, e.g., total # of courses completed since sign-up.Also, who is responsible for keeping the Flink jobs running: the data scientists, or do you have a separate streaming platform team?

评论 #28824001 未加载

s_Hoggover 3 years ago

This thing reads like it was written a few years ago, to my mind (source: I've been working in ML most of a decade now).Disintermediation of data pipeline creation is definitely nothing new at this point and the technologies aren't that novel at this point either. I'd be surprised that this is on the front page, but it takes time for the lessons in this article to be learnt by a large enough amount of people that it becomes humdrum.Above all, it reminds me of a consultant friend telling me he had two clients who built feature stores - one with an open-ended goal of enabling people and one because they had some specific things they wanted to achieve. The outcomes they got were as dissimilar as their motives!

评论 #28818133 未加载

snidaneover 3 years ago

I'm struggling to understand what the feature store is.Is it another name for an OLAP or BI cube? Ie. a huge precomputed group by query with rollups.The only new thing I see is that it combines both historical and recent data. Kinda like an olap cube with lambda architecture.

评论 #28819851 未加载

评论 #28819877 未加载

评论 #28824006 未加载

评论 #28819852 未加载

ibgeekover 3 years ago

There are some nice insights and engineering ideas in here. Thanks for writing this and sharing!

评论 #28822062 未加载

Empowering data scientists with a feature store

4 comments

Empowering data scientists with a feature store

4 comments