I don't know if I missed it, but I would have liked to have seen a performance/accuracy comparison between wide+deep learning and a simple ensemble between wide and deep models. The advantage to having 2 separate models is that you could use just one or the other if something went wrong, or if you needed to make a faster prediction (i.e. when the escalator breaks, you get stairs).