Note that this is pretty old (2020).<p>They released code, models and raw data here: <a href="https://github.com/openai/summarize-from-feedback">https://github.com/openai/summarize-from-feedback</a>
Feel like automating the human feedback, not the summaries themselves, that should have been the core focus of research like this. As is, even reading guidelines for summary evaluation they presented the reviewers are not reproducible.
I'd be curious to see how this does compared to models trained on more professional datasets than reddit tldr.<p>For example, train a model(s) by reading every single article (including paywall/cache replacement) of <a href="https://www.techmeme.com/river" rel="nofollow">https://www.techmeme.com/river</a> <a href="https://www.mediagazer.com/river" rel="nofollow">https://www.mediagazer.com/river</a> <a href="https://www.memeorandum.com/river" rel="nofollow">https://www.memeorandum.com/river</a> <a href="https://www.wesmirch.com/river" rel="nofollow">https://www.wesmirch.com/river</a> <a href="https://ballbug.com/river" rel="nofollow">https://ballbug.com/river</a> and comparing it to the summary headline.