TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Babbage: A Clojure library for accumulation and graph computation

89 pointsby ithayerover 12 years ago

4 comments

kenkoover 12 years ago
here's the announcement email (very boiled down version of the readme, essentially):<p>babbage is a library for easily gathering data and computing summary measures in a declarative way.<p>The summary measure functionality allows you to compute multiple measures over arbitrary partitions of your input data simultaneously and in a single pass. You just say what you want to compute:<p><pre><code> &#62; (def my-fields {:y (stats :y count) :x (stats :x count) :both (stats #(+ (or (:x %) 0) (or (:y %) 0)) count sum mean)}) </code></pre> and the sets that are of interest:<p><pre><code> &#62; (def my-sets (-&#62; (sets {:has-y #(contains? % :y}) (complement :has-y))) ;; could also take intersections, unions </code></pre> And then run it with some data:<p><pre><code> &#62; (calculate my-sets my-fields [{:x 1 :y 2} {:x 10} {:x 4 :y 3} {:x 5}]) {:not-has-y {:y {:count 0}, :x {:count 2}, :both {:mean 7.5, :sum 15, :count 2}}, :has-y {:y {:count 2}, :x {:count 2}, :both {:mean 5.0, :sum 10, :count 2}}, :all {:y {:count 2}, :x {:count 4}, :both {:mean 6.25, :sum 25, :count 4}}} </code></pre> The functions :x, :y, and #(+ (or (:x %) 0) (or (:y %) 0)) defined in the fields map are called once per input element no matter how many sets the element contributes to. The function #(contains? % y) is also called once per input element, no matter how many unions, intersections, complements, etc. the set :has-y contributes to.<p>A variety of measure functions, and structured means of combining them, are supplied; it's also easy to define additional measures.<p>babbage also supplies a method for running computations structured as dependency graphs; this can make gathering the initial data for summarizing simpler to express. To give an example that's probably familiar from another context:<p><pre><code> &#62; (defgraphfn sum [xs] (apply + xs)) &#62; (defgraphfn sum-squared [xs] (sum (map #(* % %) xs))) &#62; (defgraphfn count-input :count [xs] (count xs)) &#62; (defgraphfn mean [count sum] (double (/ sum count))) &#62; (defgraphfn mean2 [count sum-squared] (double (/ sum-squared count))) &#62; (defgraphfn variance [mean mean2] (- mean2 (* mean mean))) &#62; (run-graph {:xs [1 2 3 4]} sum variance sum-squared count-input mean mean2) {:sum 10 :count 4 :sum-squared 30 :mean 2.5 :variance 1.25 :mean2 7.5 :xs [1 2 3 4]} </code></pre> Options are provided for parallel, sequential, and lazy computation of the elements of the result map, and for resolving the dependency graph in advance of running the computation for a given input, either at runtime or at compile time.
评论 #5157957 未加载
eschulteover 12 years ago
I actually wrote something similar in bash which I use frequently when I need to munge a table of numbers on the command line [1]. The whole time I was thinking I should really be doing this in common lisp.<p>[1] <a href="http://eschulte.github.com/data-wrapper/" rel="nofollow">http://eschulte.github.com/data-wrapper/</a>
yayitsweiover 12 years ago
This will be great for building our stats dashboard. Thanks!
评论 #5157355 未加载
furqanrydhanover 12 years ago
This is great!