TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Hamilton, a Microframework for Creating Dataframes

64 pointsby krawczstefover 3 years ago

13 comments

justuswover 3 years ago
This project reminds me a lot of Dask <a href="https:&#x2F;&#x2F;dask.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;dask.org&#x2F;</a>. A library that allows delayed calculation of complex dataframes in Python.
评论 #29176176 未加载
评论 #29164825 未加载
wodowover 3 years ago
The purpose of this is discussed in their blog post, which is non-prominently linked in the README: <a href="https:&#x2F;&#x2F;multithreaded.stitchfix.com&#x2F;blog&#x2F;2021&#x2F;10&#x2F;14&#x2F;functions-dags-hamilton&#x2F;" rel="nofollow">https:&#x2F;&#x2F;multithreaded.stitchfix.com&#x2F;blog&#x2F;2021&#x2F;10&#x2F;14&#x2F;function...</a>
sterlinmover 3 years ago
I haven&#x27;t evaluated how Hamilton is implemented specifically but:<p>* It&#x27;s solving a really different problem than Spark&#x2F;Dask&#x2F;etc. It could definitely use those tools but it&#x27;s just not the same thing.<p>* If you&#x27;re looking at this and thinking it&#x27;s useless even if you&#x27;re familiar with Pandas&#x2F;dataframes it&#x27;s probably just that you haven&#x27;t had to work on the types of problems that this particular tool is intended to help with.
评论 #29164651 未加载
zmmmmmover 3 years ago
So it&#x27;s like spark for pandas? Seems like it might be better just to use spark and if there are features missing, build a framework to add in missing features on top of that - in which case you get a giant distributed processing engine for free with it. Would be interested to know if that was a consideration or not.
评论 #29160994 未加载
akavelover 3 years ago
Could someone please help me understand what a &quot;dataframe&quot; is? I see this term thrown around occasionally, but failed to find a definition&#x2F;explanation for someone who doesn&#x27;t actually already know what it is :(
评论 #29159484 未加载
评论 #29159145 未加载
评论 #29160570 未加载
评论 #29159104 未加载
评论 #29160351 未加载
评论 #29159480 未加载
评论 #29159191 未加载
评论 #29161024 未加载
noway421over 3 years ago
Reminds me of Dagger from the recent post &quot;An oral history of Bank Python&quot;: <a href="https:&#x2F;&#x2F;calpaterson.com&#x2F;bank-python.html" rel="nofollow">https:&#x2F;&#x2F;calpaterson.com&#x2F;bank-python.html</a>
评论 #29169309 未加载
physicsyogiover 3 years ago
This reminds me a bit of a Clojure library called Plumbing (formerly Graph): <a href="https:&#x2F;&#x2F;github.com&#x2F;plumatic&#x2F;plumbing" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;plumatic&#x2F;plumbing</a>. It also let you create a DAG for structured computation. It was used for a web service, at that time.
评论 #29164103 未加载
punkbrwstrover 3 years ago
My pynto <a href="https:&#x2F;&#x2F;github.com&#x2F;punkbrwstr&#x2F;pynto" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;punkbrwstr&#x2F;pynto</a> is a similar framework for creating dataframes, but using a concatenative paradigm that treats the frame as a stack of columns. Functions (&quot;words&quot;) operate on the stack to set up the graph for each column, and execution happens afterwards in parallel. Instead of function modifiers like @does it uses combinators to apply quoted operations to multiple columns. The postfix syntax (think postscript or factor) is unambiguous, if a bit old-school.
mritchie712over 3 years ago
I&#x27;m curious what the average sentiment from the Stitch data team is on this. I see the small marginal utility of this, but this would be a massive pain to implement. Imagine going back thru thousands of lines of transforms and adding the framework. Say you make a couple small mistakes somewhere because the framework is new to you. Things seem fine at first, but weeks go by and something seems off. How do you find those mistakes?<p>Newly hired data scientists would have a &quot;wtf is this thing?&quot; response. You&#x27;d really need to &quot;sell&quot; people on this and it doesn&#x27;t seem worth it.
评论 #29163541 未加载
eugenhotajover 3 years ago
This is neat for toy problems but I don&#x27;t see it working well for &quot;real&quot; pipelines. The magical DAG creation is going to be super hard to wrap your head around and even worse to debug.<p>This reminds me of an internal Google tool for doing async programming in Java (ProducerGraph or something). The idea was that you&#x27;d just write annotated functions and the framework would handle all the async stuff. Wasted many thousands of engineering hours while giving an even worse experience than manipulating futures directly.
评论 #29163131 未加载
评论 #29169332 未加载
dash2over 3 years ago
I think the README needs something to explain what you get when you write these functions. Apparently Hamilton then creates a DAG... OK, and what does that do for me?
评论 #29164725 未加载
jstx1over 3 years ago
I really don&#x27;t get the point of this just from the readme. It seems like an overly fanciful way to do something simple.
评论 #29160780 未加载
zwapsover 3 years ago
... and why? The Readme does not say?!
评论 #29164740 未加载