What about the relative size of the available datasets? It seems like that would make offline learning much more valuable than learning directly from experience.<p>The largest publicly available research datasets for machine translation are 2-3 million sentences [1]. Google's internal datasets are "two to three decimal orders of magnitudes bigger than the WMT corpora for a given
language pair" [2].<p>That's far more data than a cell phone's translation app would receive over its entire lifetime. Similarly, the amount of driving data collected by Tesla from all its cars will be much larger than the data received by any single car.<p>This suggests that most learning will happen as a batch process, ahead of time. There may be some minor adjustments for personalization, but it doesn't seem like it's enough for Agent AI to outcompete Tool AI.<p>At least so far, it seems far more important to be in a position to collect large amounts of data from millions of users, rather than learning directly from experience, which happens slowly and expensively.<p>This is not about having a human check every individual result. It's about putting a software development team in the loop. Each new release can go through a QA process where it's compared to the previous release.<p>[1] <a href="https://github.com/bicici/ParFDAWMT14" rel="nofollow">https://github.com/bicici/ParFDAWMT14</a>
[2] <a href="https://research.googleblog.com/2016/09/a-neural-network-for-machine.html" rel="nofollow">https://research.googleblog.com/2016/09/a-neural-network-for...</a>