My own guess as to why GPT overuses delve is that it’s an artefact of RLHF. When you’re training a model to respond as a chatbot rather than the next most probable token you’ve people marking responses as good/bad. You’ve got some other criteria as well, like what sounds good.<p>What’s probably happened is that the “delve” responses sound better to the people doing RLHF, so they’re disproportionally included in the output.<p>It’s not just delve, there would be a whole list of overused words that you could find by comparing a large corpus of GPT output (or any LLM) to a large corpus of human-written text. You could use that as heuristic for an AI detector, only problem being that you’d need a different corpus for each LLM.