The author went from this:<p><pre><code> int Dice::total() const {
int total = 0;
for(const_iterator current = dice.begin();
current != dice.end();
++current)
total += (*current)->faceValue();
return total;
}
</code></pre>
to this:<p><pre><code> int Dice::total() const {
return std::accumulate(
dice.begin(),
dice.end(),
0,
bind(std::plus<int>(), _1, bind(&Die::faceValue, _2))
);
}
</code></pre>
and shows his intermediate steps. I took his last variant and converted it to the following, using Intel Threading Building Blocks to perform a parallel reduce:<p><pre><code> class Sum {
const std::vector<Face*>& v;
public:
int sum;
void operator()( const tbb::blocked_range<size_t>& r ) {
const std::vector<Face*>& a = v;
for( size_t i = r.begin(); i != r.end(); ++i )
sum += a[i]->faceValue();
}
Sum( Sum& x, tbb::split ) : v(x.v), sum(0) {}
void join( const Sum& y ) {sum += y.sum;}
Sum(const std::vector<Face*>& a ) : v(a), sum(0) {}
};
int Dice::parallel_total () const {
Sum s(dice);
tbb::parallel_reduce(tbb::blocked_range<size_t>(0, dice.size()),
s,
tbb::auto_partitioner());
return s.sum;
}
</code></pre>
I had to guess what his Dice implementation contained and if I wrote in from scratch, I may have done it differently (eg, I'd handle that vector in Sum differently). Also, it doesn't really seem worth it performance-wise ;-) but still interesting to see it in action.