Its random forests ... each tree is trained on a <i></i>subset<i></i> of the data. You can split the massive dataset into chunks and train independently. That sidesteps the "big data" hangup.<p>If you look at the implementation for ski-learn, each tree emits a normalised probability vector for each prediction, those vectors are simply multiplied together to get the aggregate prediction, so its not very difficult to do yourself.<p>Although regardless, you are applying a batch learning technique anyway. You want an incremental learner for big data.