TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Koheesio: Nike's Python-based framework to build advanced data-pipelines

217 点作者 betacar12 个月前

13 条评论

newfocogi12 个月前
I'd like to understand what data engineering inside Nike is actually like. I'm curious because I have relevant experience on my LinkedIn profile, and I get reached out to almost weekly from third party recruiters trying to fill really low paying contract data engineering and ML jobs with Nike. These roles seem to be targeting people with professional experience in the US but pay roughly a 3rd of what I would consider the going rate. There's another top level comment here that this tool might make sense "in a shop with a lot of inexperienced devs", which would confirm my anecdata. Maybe the roles are actually scams, who knows :shrug:
评论 #40576409 未加载
评论 #40576029 未加载
评论 #40576692 未加载
steveBK12312 个月前
If I had to guess, a tool like this might be useful in a shop with a lot of inexperienced devs. It&#x27;s a thin wrapper to make sure everyone walks the same well worn path the same way. You have 2-3 devs work on the tooling, and a much larger team doing rote ETL.<p>I worked at a shop that did this and the trade-off is TTM, as your 2 person tools team is constantly needing to unblock ETL team with new features as they encounter new requirements in the wild.<p>If your ETL team is 20+ people and the tools team doesn&#x27;t have a head start, tools team will quickly fall behind an insurmountable backlog as your ETL team spins its wheels. But you might save some money if you choose the right KPI..
评论 #40573638 未加载
评论 #40576046 未加载
评论 #40599664 未加载
waffletower12 个月前
Many data engineering problems are impeded by strong typing, particularly type transduction applications (translating between a database type system and a transport such as Avro, for example). While in many cases that is somebody else&#x27;s problem -- it is solved in a library -- when it isn&#x27;t the strengths and facility of a dynamic language can save you considerable code complexity and maintenance. Type control is often central to reporting as well, and it is, again, more awkward in a strong typing context. I would tend to argue that insistence upon type frameworks such as pydantic in a data engineering framework is naive and imposed by academic rather than industry experience. There is a reason that python is chosen for data processing applications, and it certainly isn&#x27;t typing.
评论 #40576700 未加载
评论 #40578707 未加载
评论 #40576104 未加载
评论 #40584153 未加载
serial_dev12 个月前
I used to work a little with ETLs, Spark, Storm, etc and I honestly don&#x27;t understand the value proposition of this library. I&#x27;m no data engineer expert by any means (it was like 2 years working on data eng stuff about 30% of the time 5+ years ago), but I expected that at least I&#x27;d get what this is useful for.
评论 #40572779 未加载
评论 #40572848 未加载
评论 #40572563 未加载
benterix12 个月前
A probably better explanation of what it is and why you might want to use it (or not) can be found here:<p><a href="https:&#x2F;&#x2F;engineering.nike.com&#x2F;koheesio&#x2F;latest&#x2F;tutorials&#x2F;onboarding.html#using-a-context-class" rel="nofollow">https:&#x2F;&#x2F;engineering.nike.com&#x2F;koheesio&#x2F;latest&#x2F;tutorials&#x2F;onboa...</a>
alessmar12 个月前
A few weeks ago, I chose to write my data pipelines using Apache Beam. It seems that Koheesio shares some features with this project, but I believe Apache Beam is superior due to its ability to run on various runners, support multiple programming languages, and integrate with numerous data sources and destinations.
评论 #40577000 未加载
tpoacher12 个月前
Oh so like luigi? Great!
yevpats12 个月前
Check out CloudQuery - Arrow powered ELT framework (Author here :) )
评论 #40581687 未加载
评论 #40578186 未加载
esafak12 个月前
&gt; Koheesio is not in competition with other libraries.<p>Yes, it is, because nobody wants to run multiple orchestrators, and the &quot;What sets Koheesio apart from other libraries?&quot; section does little to help users decide why they should pick yours.<p>Workflow orchestration is a mature category, as evidenced by the length of this list: <a href="https:&#x2F;&#x2F;github.com&#x2F;meirwah&#x2F;awesome-workflow-engines">https:&#x2F;&#x2F;github.com&#x2F;meirwah&#x2F;awesome-workflow-engines</a><p>I would expect someone who&#x27;s seriously writing a new orchestrator in 2024 to cite the alternatives, their shortcomings, and how you intend to address them. Bonus points if you make a neat little table.<p>The fact that you&#x27;re leading with Python does not inspire confidence. Pretty much all workflow orchestrators use Python for their glue, and that&#x27;s hardly the interesting part.<p>What were they using at Nike before this?
评论 #40571399 未加载
评论 #40571542 未加载
评论 #40571902 未加载
评论 #40577272 未加载
评论 #40572393 未加载
评论 #40575870 未加载
评论 #40571488 未加载
评论 #40571835 未加载
djaouen12 个月前
So this is like Broadway (Elixir), but for Python?
adrianbr12 个月前
That&#x27;s really cool, did you already saw the dlt library? That one&#x27;s done for very easy to use EL in python. It&#x27;s similarly modular and built by senior data engineers for the data team, and the sources are generators which you could probably use too.<p>How is koheesio different to dlt? Where could they complement each other?
hipadev2312 个月前
Had Nike as a client for a period of time, interacted with quite a few people across their data org. There is absolutely no software you want authored by them.
评论 #40572688 未加载
评论 #40571377 未加载
jiggunjer12 个月前
Another snakemake?