TechEcho

4 comments

kenkoover 12 years ago

here's the announcement email (very boiled down version of the readme, essentially):babbage is a library for easily gathering data and computing summary measures in a declarative way.The summary measure functionality allows you to compute multiple measures over arbitrary partitions of your input data simultaneously and in a single pass. You just say what you want to compute:<pre><code> > (def my-fields {:y (stats :y count) :x (stats :x count) :both (stats #(+ (or (:x %) 0) (or (:y %) 0)) count sum mean)}) </code></pre> and the sets that are of interest:<pre><code> > (def my-sets (-> (sets {:has-y #(contains? % :y}) (complement :has-y))) ;; could also take intersections, unions </code></pre> And then run it with some data:<pre><code> > (calculate my-sets my-fields [{:x 1 :y 2} {:x 10} {:x 4 :y 3} {:x 5}]) {:not-has-y {:y {:count 0}, :x {:count 2}, :both {:mean 7.5, :sum 15, :count 2}}, :has-y {:y {:count 2}, :x {:count 2}, :both {:mean 5.0, :sum 10, :count 2}}, :all {:y {:count 2}, :x {:count 4}, :both {:mean 6.25, :sum 25, :count 4}}} </code></pre> The functions :x, :y, and #(+ (or (:x %) 0) (or (:y %) 0)) defined in the fields map are called once per input element no matter how many sets the element contributes to. The function #(contains? % y) is also called once per input element, no matter how many unions, intersections, complements, etc. the set :has-y contributes to.A variety of measure functions, and structured means of combining them, are supplied; it's also easy to define additional measures.babbage also supplies a method for running computations structured as dependency graphs; this can make gathering the initial data for summarizing simpler to express. To give an example that's probably familiar from another context:<pre><code> > (defgraphfn sum [xs] (apply + xs)) > (defgraphfn sum-squared [xs] (sum (map #(* % %) xs))) > (defgraphfn count-input :count [xs] (count xs)) > (defgraphfn mean [count sum] (double (/ sum count))) > (defgraphfn mean2 [count sum-squared] (double (/ sum-squared count))) > (defgraphfn variance [mean mean2] (- mean2 (* mean mean))) > (run-graph {:xs [1 2 3 4]} sum variance sum-squared count-input mean mean2) {:sum 10 :count 4 :sum-squared 30 :mean 2.5 :variance 1.25 :mean2 7.5 :xs [1 2 3 4]} </code></pre> Options are provided for parallel, sequential, and lazy computation of the elements of the result map, and for resolving the dependency graph in advance of running the computation for a given input, either at runtime or at compile time.

评论 #5157957 未加载

eschulteover 12 years ago

I actually wrote something similar in bash which I use frequently when I need to munge a table of numbers on the command line [1]. The whole time I was thinking I should really be doing this in common lisp.[1] <a href="http://eschulte.github.com/data-wrapper/" rel="nofollow">http://eschulte.github.com/data-wrapper/</a>

yayitsweiover 12 years ago

This will be great for building our stats dashboard. Thanks!

评论 #5157355 未加载

furqanrydhanover 12 years ago

This is great!

4 comments

kenkoover 12 years ago

评论 #5157957 未加载

eschulteover 12 years ago

yayitsweiover 12 years ago

This will be great for building our stats dashboard. Thanks!

评论 #5157355 未加载

furqanrydhanover 12 years ago

This is great!

Babbage: A Clojure library for accumulation and graph computation

4 comments

Babbage: A Clojure library for accumulation and graph computation

4 comments