TechEcho

13 comments

justuswover 3 years ago

This project reminds me a lot of Dask <a href="https://dask.org/" rel="nofollow">https://dask.org/</a>. A library that allows delayed calculation of complex dataframes in Python.

评论 #29176176 未加载

评论 #29164825 未加载

wodowover 3 years ago

The purpose of this is discussed in their blog post, which is non-prominently linked in the README: <a href="https://multithreaded.stitchfix.com/blog/2021/10/14/functions-dags-hamilton/" rel="nofollow">https://multithreaded.stitchfix.com/blog/2021/10/14/function...</a>

sterlinmover 3 years ago

I haven't evaluated how Hamilton is implemented specifically but:<p>* It's solving a really different problem than Spark/Dask/etc. It could definitely use those tools but it's just not the same thing.<p>* If you're looking at this and thinking it's useless even if you're familiar with Pandas/dataframes it's probably just that you haven't had to work on the types of problems that this particular tool is intended to help with.

评论 #29164651 未加载

zmmmmmover 3 years ago

So it's like spark for pandas? Seems like it might be better just to use spark and if there are features missing, build a framework to add in missing features on top of that - in which case you get a giant distributed processing engine for free with it. Would be interested to know if that was a consideration or not.

评论 #29160994 未加载

akavelover 3 years ago

Could someone please help me understand what a "dataframe" is? I see this term thrown around occasionally, but failed to find a definition/explanation for someone who doesn't actually already know what it is :(

评论 #29159484 未加载

评论 #29159145 未加载

评论 #29160570 未加载

评论 #29159104 未加载

评论 #29160351 未加载

评论 #29159480 未加载

评论 #29159191 未加载

评论 #29161024 未加载

noway421over 3 years ago

Reminds me of Dagger from the recent post "An oral history of Bank Python": <a href="https://calpaterson.com/bank-python.html" rel="nofollow">https://calpaterson.com/bank-python.html</a>

评论 #29169309 未加载

physicsyogiover 3 years ago

This reminds me a bit of a Clojure library called Plumbing (formerly Graph): <a href="https://github.com/plumatic/plumbing" rel="nofollow">https://github.com/plumatic/plumbing</a>. It also let you create a DAG for structured computation. It was used for a web service, at that time.

评论 #29164103 未加载

punkbrwstrover 3 years ago

My pynto <a href="https://github.com/punkbrwstr/pynto" rel="nofollow">https://github.com/punkbrwstr/pynto</a> is a similar framework for creating dataframes, but using a concatenative paradigm that treats the frame as a stack of columns. Functions ("words") operate on the stack to set up the graph for each column, and execution happens afterwards in parallel. Instead of function modifiers like @does it uses combinators to apply quoted operations to multiple columns. The postfix syntax (think postscript or factor) is unambiguous, if a bit old-school.

mritchie712over 3 years ago

I'm curious what the average sentiment from the Stitch data team is on this. I see the small marginal utility of this, but this would be a massive pain to implement. Imagine going back thru thousands of lines of transforms and adding the framework. Say you make a couple small mistakes somewhere because the framework is new to you. Things seem fine at first, but weeks go by and something seems off. How do you find those mistakes?<p>Newly hired data scientists would have a "wtf is this thing?" response. You'd really need to "sell" people on this and it doesn't seem worth it.

评论 #29163541 未加载

eugenhotajover 3 years ago

This is neat for toy problems but I don't see it working well for "real" pipelines. The magical DAG creation is going to be super hard to wrap your head around and even worse to debug.<p>This reminds me of an internal Google tool for doing async programming in Java (ProducerGraph or something). The idea was that you'd just write annotated functions and the framework would handle all the async stuff. Wasted many thousands of engineering hours while giving an even worse experience than manipulating futures directly.

评论 #29163131 未加载

评论 #29169332 未加载

dash2over 3 years ago

I think the README needs something to explain what you get when you write these functions. Apparently Hamilton then creates a DAG... OK, and what does that do for me?

评论 #29164725 未加载

jstx1over 3 years ago

I really don't get the point of this just from the readme. It seems like an overly fanciful way to do something simple.

评论 #29160780 未加载

zwapsover 3 years ago

... and why? The Readme does not say?!

评论 #29164740 未加载

13 comments

justuswover 3 years ago

This project reminds me a lot of Dask <a href="https://dask.org/" rel="nofollow">https://dask.org/</a>. A library that allows delayed calculation of complex dataframes in Python.

评论 #29176176 未加载

评论 #29164825 未加载

wodowover 3 years ago

sterlinmover 3 years ago

评论 #29164651 未加载

zmmmmmover 3 years ago

评论 #29160994 未加载

akavelover 3 years ago

评论 #29159484 未加载

评论 #29159145 未加载

评论 #29160570 未加载

评论 #29159104 未加载

评论 #29160351 未加载

评论 #29159480 未加载

评论 #29159191 未加载

评论 #29161024 未加载

noway421over 3 years ago

Reminds me of Dagger from the recent post "An oral history of Bank Python": <a href="https://calpaterson.com/bank-python.html" rel="nofollow">https://calpaterson.com/bank-python.html</a>

评论 #29169309 未加载

physicsyogiover 3 years ago

评论 #29164103 未加载

punkbrwstrover 3 years ago

mritchie712over 3 years ago

评论 #29163541 未加载

eugenhotajover 3 years ago

评论 #29163131 未加载

评论 #29169332 未加载

dash2over 3 years ago

I think the README needs something to explain what you get when you write these functions. Apparently Hamilton then creates a DAG... OK, and what does that do for me?

评论 #29164725 未加载

jstx1over 3 years ago

I really don't get the point of this just from the readme. It seems like an overly fanciful way to do something simple.

评论 #29160780 未加载

zwapsover 3 years ago

... and why? The Readme does not say?!

评论 #29164740 未加载

Show HN: Hamilton, a Microframework for Creating Dataframes

13 comments

Show HN: Hamilton, a Microframework for Creating Dataframes

13 comments