Tangential, but why didn't the OpenAssistant team (lead by the author of the video) release the OpenAssistant dataset? As far as I know, the project was shut down, and only some initial highly filtered version of the data got released. This dataset could be very valuable for the community that created it.