科技回声

7 条评论

valarauca1将近 11 年前

To quote an older hackernews comment:"I love when people rediscover basic unix utilities and have this light bulb moment and realize that people have been working on the same problems as them for much longer and basically solved it 20 years ago."

rgarcia将近 11 年前

If you think improving education with tools backed by years of unix wisdom is interesting, we're hiring! <a href="https://clever.com/about/jobs#engineer-full-stack" rel="nofollow">https://clever.com/about/jobs#engineer-full-stack</a>

spasquali将近 11 年前

It's great to remind everyone how Node follows various rules of the Unix Philosophy, and how it is designed to make process spawning/streaming as natural as on the OS.I would prefer it though if the implication wasn't that a failure in Node's design is responsible for the failure of this in-process-memory technique of sorting massive data sets. From the article:"However, as more and more districts began relying on Clever, it quickly became apparent that in-memory joins were a huge bottleneck."Indeed..."Plus, Node.js processes tend to conk out when they reach their 1.7 GB memory limit, a threshold we were starting to get uncomfortably close to."Maybe simply "processes" rather than "Node processes"? -- I don't think this is a Node-only problem."Once some of the country’s largest districts started using Clever, we realized that loading all of a district’s data into memory at once simply wouldn’t scale."I think this was predictable. Earlier in the article I noticed this line:"We implemented the join logic we needed using a simple in-memory hash join, avoiding premature optimization."The "premature optimization" line is becoming something of a trope. It is not bad engineering to think at least as far as your business model. It sounds like reaching 1/6 of your market led to a system failure. This could (should?) have been anticipated.

评论 #7923192 未加载

waylonflinn将近 11 年前

Cool solution. Reminds me of the Flow-Based programming movement happening in the node community. <a href="http://noflojs.org" rel="nofollow">http://noflojs.org</a>It's also interesting to consider why we don't actually build more tools this way (generalizing the Unix philosophy). Some Real Talk about that here: <a href="http://memerocket.com/2006/12/01/the-unix-tools-philosophy-the-big-lie-or-the-big-missed-opportunity/" rel="nofollow">http://memerocket.com/2006/12/01/the-unix-tools-philosophy-t...</a>

platz将近 11 年前

Glad they found a solution, but wonder how the initial planning went when they started up.Was there a conscious decision at the beginning to avoid a database? If so, why not? It kinds of sounds like they'd like to go in that direction now, but it'd require a re-write.

erjiang将近 11 年前

I thought the 1.7GB memory limit for v8 was lifted? This bug[0] seems to indicate that was fixed.[0] <a href="https://code.google.com/p/v8/issues/detail?id=847" rel="nofollow">https://code.google.com/p/v8/issues/detail?id=847</a>

评论 #7913064 未加载

siculars将近 11 年前

Hi Clever gang, this hack is rather clever ;) I do a similar thing dealing with complicated relational structures but I child out to bash scripts vs your stream solution.Is your stack coffescript? Doesn't look like generic javascript to me...

评论 #7917321 未加载

7 条评论

valarauca1将近 11 年前

rgarcia将近 11 年前

spasquali将近 11 年前

评论 #7923192 未加载

waylonflinn将近 11 年前

platz将近 11 年前

erjiang将近 11 年前

评论 #7913064 未加载

siculars将近 11 年前

评论 #7917321 未加载

The Best Tool for the Join: Scaling Node.js with Unix

7 条评论

The Best Tool for the Join: Scaling Node.js with Unix

7 条评论