This is quite significant if you're trying build LLM-powered apps that are meant to be used in a multilingual context.<p>However, this is not really OpenAI's fault: the English writing system is much more simple compared to those of Chinese or Hindi, and even languages like French, Spanish and German, that are written in the same Latin script as English, make much more use of diacritics compared to English. So it is natural for the tokenizer to be much more efficient for English compared to languages with more complex writing systems.