TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Orchest – Data Science Pipelines

64 点作者 ricklamers将近 5 年前
Hello Hacker News! We are Rick &amp; Yannick from Orchest (https:&#x2F;&#x2F;www.orchest.io - https:&#x2F;&#x2F;github.com&#x2F;orchest&#x2F;orchest). We&#x27;re building a visual pipeline tool for data scientists. The tool can be considered to be high-code because you write your own Python&#x2F;R notebooks and scripts, but we manage the underlying infrastructure to make it &#x27;just work™&#x27;. You can think of it as a simplified version of Kubeflow.<p>We created Orchest to free data scientists from the tedious engineering related tasks of their job. Similar to how companies like Netflix, Uber and Booking.com support their data scientists with internal tooling and frameworks to increase productivity. When we worked as data scientists ourselves we noticed how heavily we had to depend on our software engineering skills to perform all kinds of tasks. From configuring cloud instances for distributed training, to optimizing the networking and storage for processing large amounts of data. We believe data scientists should be able to focus on the data and the domain specific challenges.<p>Today we are just at the very beginning of making better tooling available for data science and are launching our GitHub project that will give enhanced pipelining abilities to data scientists using the PyData&#x2F;R stack, with deep integration of Jupyter Notebooks.<p>Currently Orchest supports:<p>1) visually and interactively editing a pipeline that is represented using a simple JSON schema;<p>2) running remote container based kernels through the Jupyter Enterprise Gateway integration;<p>3) scheduling experiments by launching parameterized pipelines on top of our Celery task scheduler;<p>4) configuring local and remote data sources to separate code versioning from the data passing through your pipelines.<p>We are here to learn and get feedback from the community. As youngsters we don&#x27;t have all the answers and are always looking to improve.

8 条评论

ellisv将近 5 年前
&gt; We&#x27;re building a visual pipeline tool for data scientists.<p>As a Sr. DS&#x2F;ML Engineer, this doesn&#x27;t speak to me.
评论 #24139439 未加载
评论 #24138999 未加载
评论 #24140775 未加载
ishcheklein将近 5 年前
Reminds me a bit of <a href="https:&#x2F;&#x2F;plynx.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;plynx.com&#x2F;</a> , and it&#x27;s also open source. Is there a major differentiator I&#x27;m missing? Also, what is your idea regarding the use case. Why would I need to run it locally for example? Is it mostly about productionizing ML?
评论 #24140694 未加载
评论 #24139476 未加载
vasinov将近 5 年前
This looks cool! A couple of questions:<p>1. Currently, if I install something in the notebook, does it get re-installed every time the pipeline is run? Is there any way to &quot;snapshot&quot; the state of the container?<p>2. Where is the data stored between the steps?<p>3. How well-integrated is it with AWS cloud primitives such as EC2 instances, EFS, and S3?
评论 #24134898 未加载
pplonski86将近 5 年前
Congratulations! I remember your earlier project: grid studio. Do you support scheduling periodic tasks? Do you support execution triggered with webhook? or some way to expose notebook as REST API?
评论 #24139361 未加载
Obinkhorst将近 5 年前
Thanks for sharing, this is super helpful. I&#x27;m endlessly jealous of the teams at Uber and Booking and their fancy tools
评论 #24135130 未加载
rgmvisser将近 5 年前
Really cool! I can’t wait to start playing with it.<p>Can two people collaborate on the same project at the same time?
评论 #24132554 未加载
abalaji将近 5 年前
How do you think about this compared to something like Dataiku?
评论 #24140831 未加载
xiaodai将近 5 年前
I have wanted something like this.<p>Julia support?
评论 #24140842 未加载