At my previous job we had a human review stage in a data pipeline. 5-10 people at an outsourcing company in Bangladesh would review things via a simple web interface we provided for them. There were ~10 factors they were reviewing, all fixed options (no free text), but varying from 5 to 500 options per factor. It was all based on a few text fields and around 5 images.<p>On the surface of it, I'd expect ChatGPT to do very well at this. It's simple text and images, not many options and theoretically very limited context.<p>However the more I think about it the less sure I am. Firstly these weren't crowd-sourced reviews, they were <i>trained</i> reviewers, paid hourly not per review. Incentives were definitely in favour of the long term business relationship. Then there was the training doc, we maintained a vast disambiguation doc used to resolve things that were vague or could be interpreted multiple ways, this was constantly being revised. All necessary context should have been in that but it wasn't and reviewers definitely found patterns that worked and didn't. Lastly the reviewers were in a Slack channel where they would ask questions to their manager on our side, and while this might have only been ~1% of tasks, it was an important process.<p>So maybe you could point ChatGPT at it and let it run, but the oversight process we had would still be necessary. The disambiguation doc would have been too long for ChatGPT's context at the moment, but that will likely change in the near future. Would the workflow be to keep tweaking the prompt to add special case after special case? How do you scale "do this, but not that, but add this, but..." in prompting, and would ChatGPT become as confused as a human after enough of that – I expect so given that it's only a language model and that's not effective communication.