I’ve been working on some projects with Greek-language data and have encountered some interesting challenges with RAG and LLMs. In an English-speaking universe, it's relatively straightforward to have a decent prototype in a short time. But in other languages, it has proven a bit trickier.<p>I’m curious to hear from others working with non-English languages—what challenges have you faced? Some areas of interest:
- Models that are more open to switch language
- Availability and quality of language-specific retrieval corpora
- Differences in tokenization and embedding quality
- Handling multilingual queries and responses
- Any workarounds or best practices you’ve discovered<p>Would love to hear both success stories and pain points.