In general a good short rule of thumb is to always always _always_ write out the full sentence you want to translate and use the tooling to interpolate everything you want to put in it. That way the translator always sees the full context and you make it harder (although not impossible) for yourself to shoot yourself in the foot. Another recommendation I would add is to use two meta locales in development in addition to whatever you need to support otherwise: id and pseudo. The id locale should be an identity function so you (and the translator and everyone else) can open up a page and see what keys are used on that page. The pseudo locale should be either random or pseudorandom text and is good for both ensuring you haven't accidentally left something hardcoded as well as checking how your layout plays with different length strings. These ideas alone will get you most of the way most of the time, and they have the added benefit that they're straightforward to teach to juniors.
A particularly tricky case of this is with usernames and user defined content.<p>Eg, a notification like "Alice is online" in some languages requires knowing Alice's gender. Which may be something that's not even stored anywhere in the system. There's probably some language out there that requires some other piece of personal info for a correct translation.<p>To make things tricky, try having a multitude of items that you refer to: "You're holding a dagger". Now you need to have a serious discussion with your translators, because this is going to get all kinds of tricky, as the maker of Obra Dinn discovered: <a href="https://www.youtube.com/watch?v=OMi6xgdSbMA">https://www.youtube.com/watch?v=OMi6xgdSbMA</a><p>And to make things extra-tricky, allow users to create content. "Alice gave you a banana", where "banana" is a custom object Alice made herself.<p>Most translation efforts seem to give up at this point and resort to something stilted like "Alice: online"
Smells:<p>* If you're concatenating sentence bits, you're doing it wrong<p>* If you're formatting numbers, dates, times, or durations by hand, you're doing it wrong<p>* If you're formatting strings with placeholders and you don't know the gender and number of your placeholders, your translators are going to have a bad time<p>There are two more important rules that this article doesn't mention<p>* Write long descriptions of what the thing is that you're asking someone to translate (button label, menu item, dialog header...), and include a screenshot. Translating very short strings without context is very difficult.<p>* Ask your translators to do a global once-over QA pass once in a while to detect inconsistencies and weirdness. Once I dealt with a product that had three tabs, and two of the tabs were translated identically. Each tab header translation made sense on its own, but as distinct tab headers side by side, it made no sense to use the same word.
Another aspect to be aware of is that English is often much shorter than the equivalent translated text, especially on buttons with text labels. I remember many years ago we used a rough rule of thumb of always doubling the space used for English to ensure there was enough space for the translated text.
It's frustrating that the post does not provide any solution for some of the problems like declinations and gender. I internationalised a couple of applications, and it's incredible how i18n frameworks are still so limited in linguistic aspects that are so important for so many languages.<p>Finnish, for example works with a ton of suffixes, and you end up having to rewrite the copy (to non natural structures) to fit interpolation and declinations. Portuguese genders almost every subject in a phrase construction.<p>The web is killing (or creating artificial versions) of many languages because the lack of tooling...
With formatjs [0], you don't have to split the sentence for interpolation. The same example as in the article can be implemented as:<p><pre><code> const message = defineMessage({
defaultMessage: 'Learn more about <a>supported images</a>.',
description: 'Footer text containing a hyperlink',
})
</code></pre>
and the anchor element can be interpolated as:<p><pre><code> formatMessage(message, {
a: (chunks: ReactNode) => <a href="#link">{chunks}</a>,
})
</code></pre>
[0]: <a href="https://formatjs.io" rel="nofollow noreferrer">https://formatjs.io</a>
Good article. Knowing some Slavic, Latin or Asian language helps immensely when dealing with i18n.<p>I wrote an article on a similar subject (with some additional technical details about Android and iOS) a few years ago, with a few similar conclusions:<p><a href="https://jakub.gieryluk.net/blog/reusing-software-translations-ios-android-web/" rel="nofollow noreferrer">https://jakub.gieryluk.net/blog/reusing-software-translation...</a>
Don't forget Right-To-Left languages, that also affects how UI elements are arranged (position within the page) and rendered (input widgets like sliders get reversed).
I’m highly procedural game dev lots of text has unpredictable text inserted within it. Eg, a notification for how “(person 1) has left (room1) to perform (action) in (room2) with (item1)”<p>So your “never do interpolation” trick is a bit of an over-simplification already. Not to mention all the ways to modify a verb or noun with surrounding words, Eg, preceding it with the. I walked to Larry vs I walked to the couch<p>Our languages systems for our game got pretty complex, pretty fast, and I find these simplified hand wavy articles pretty frustrating tbh
Translation/Internationalisation is one of the hardest problem that is not going to be solved by technology only.<p>RTL, plural depending on the number, non-latin char behavior, font issues, UI broken by longer translation, context dependent translation, etc. Every time I start a project or think about it I'm sweating.
Can I just rant for a second about how much I hate the whole `<starting-letter-of-a-word-><count-of-inner-letters><ending-letter-of-a-word>` trend that folks seem to love? This intentional sort of obfuscation makes it hard for juniors or students (the exact people who would be interested in an article like this), to engage in the material. The most egregious example is doing it for the word 'accessibility'!
I think the Polish example is even a bit more complex than that: it's not really that Polish has a separate form for "a few". That's the regular plural form. It's that Polish uses the <i>genitive plural</i> with certain numerals, instead of the nominative. That is, instead of saying "5 dogs" you say "5 of dogs".<p>This doesn't, to my knowledge, apply if there is no numeral provided, even if we're talking about 1000 dogs, so it wouldn't be right to call it a plural.<p>Of course, the point of the article still stands.<p>Disclaimer: I don't speak Polish. I did learn some Czech though at some point (most of which I've forgotten).
Not covered here: fonts and glyph appearances, which will nearly always end up displaying wrong in certain Asian languages -- <a href="https://heistak.github.io/your-code-displays-japanese-wrong/" rel="nofollow noreferrer">https://heistak.github.io/your-code-displays-japanese-wrong/</a>
> The order of the words is hardcoded, with “added” preceding the date. This would be incorrect in many languages, from Dutch (“1 januari toegevoegd”)<p>This is simply not true. Since English and Dutch are both Germanic languages they largely work the same way. Saying "Toegevoegd: 1 januari" would be just fine. By using "1 januari toevoegd" you're syntactically changing the sentence.
This is a good article, though as someone who prefers references/tables to prose for technical topics, the real find for me was the link out to the Unicode CLDR project (which sadly contains a LOT of broken links right now due to a data migration effort but I'll bookmark it & hopefully it'll be navigable in future).<p>As someone with a Polish partner, who also fluently speaks my own weird minority local language (Irish), I'm more than well aware of pluralisation pitfalls; Irish may have one of the most complex rulesets, so much so that I'm almost certain it isn't represented in CDLR (possibly can't be). But I see the plural pitfall brought up in so many of these guide - I've always been curious about other unexpected/unintuitive pitfalls across languages out there. Would love if there was a simple reference of the most interesting (starting with plurals I guess).
Unrelated but I had a weird experience navigating to this article on my IPhone: the music in my headset switched to call mode. I can reproduce that about 50% of the time.<p>Is there something using the microphone somewhere? Feels really weird…<p>14 Pro Max with latest beta software
I would give my little toe to see what this person's opinion is on CCS/CCMS (Component Content Systems)<p><a href="https://en.wikipedia.org/wiki/Component_content_management_system" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Component_content_management_s...</a><p>There's quite a bit more to be written here about natural language, formal language, and how constituents of each class interact with each other. Stuff that the initial architects of "component content" were not necessarily thinking about, because they were coming at the problem from an extremely limited corpus.
Also, number and percent formats are important too. I've seen many 'professional' websites/software, that uses only a standart 100.0% format, where the decimal and the percent are not localised.
tangent:<p>did you know that "institutionalization" also resolves to the English numerical contraction: "i18n"?<p>here's a tool to test for conflicts in other words (a11y, k8s, ets):<p><a href="https://encapsulate.me/writing/e25n.html" rel="nofollow noreferrer">https://encapsulate.me/writing/e25n.html</a>
i am on a 2 year long rabbit hole to solve many i18n problems that devs face <a href="https://github.com/inlang/inlang">https://github.com/inlang/inlang</a><p>we are in our third (major) refactor because the problem is so complex and new requirements emerge regularly :/
The post is interesting as it exposes the problem statement.<p>Unfortunately, I expected from a Shopify Engineering blog that it would provide <i>solutions</i> to this problem like a JS library for i18n.<p>Disclaimer: As I'm not a frontend developer I'm not familiar with the ecosystem solutions.
Is there some kind of Auto-i18n where the function sends a request to a server if there is no localization available? The server could in turn request a translation from a service and add it to the localization files
I've found that I need to support localization from the very start.<p>I never display a quoted string. I always use Apple's tokenization (or create my own, if doing server code).<p>Apple has terrific support for localization, which puts the onus on us, to honor it. I have some basic extensions that I use to support localization in my coding[0-2], but there's also just stuff I need to keep in mind, all the time.<p>There has been discussion of how to deal with things like word order in different languages. For example, in Germanic languages, the modifier usually precedes the subject, while in Romance languages, it tends to be the opposite.<p>Thankfully, Apple supports the "$" format for sprintf strings[3], so we can do stuff like this:<p><pre><code> import Foundation
let localizationAssets = [
(format: "The %1$@ %2$@", modifier: "white", subject: "horse"),
(format: "Le %2$@ %1$@", modifier: "blanc", subject: "cheval")
]
func localizedHorse(_ inLocalization: Int) -> String {
String(
format: localizationAssets[inLocalization].format,
localizationAssets[inLocalization].modifier,
localizationAssets[inLocalization].subject
)
}
// English (Prints "The white horse")
print(localizedHorse(0))
// French (Prints "Le cheval blanc")
print(localizedHorse(1))
</code></pre>
[0] <a href="https://github.com/RiftValleySoftware/RVS_Generic_Swift_Toolbox/blob/master/Sources/RVS_Generic_Swift_Toolbox/RVS_Generic_Swift_Toolbox_Extensions/RVS_Foundation_Extensions.swift#L184">https://github.com/RiftValleySoftware/RVS_Generic_Swift_Tool...</a><p>[1] <a href="https://github.com/RiftValleySoftware/RVS_Generic_Swift_Toolbox/blob/master/Sources/RVS_Generic_Swift_Toolbox/RVS_Generic_Swift_Toolbox_Extensions/RVS_Foundation_Extensions.swift#L192">https://github.com/RiftValleySoftware/RVS_Generic_Swift_Tool...</a><p>[2] <a href="https://github.com/RiftValleySoftware/RVS_Generic_Swift_Toolbox/blob/master/Sources/RVS_Generic_Swift_Toolbox/RVS_Generic_Swift_Toolbox_Extensions/RVS_Foundation_Extensions.swift#L200">https://github.com/RiftValleySoftware/RVS_Generic_Swift_Tool...</a><p>[3] <a href="https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/Strings/Articles/formatSpecifiers.html#//apple_ref/doc/uid/TP40004265-SW2" rel="nofollow noreferrer">https://developer.apple.com/library/archive/documentation/Co...</a>
<p><pre><code> I name them by component.context.phrase
There's https://cldr.unicode.org/index .
In Angular I liked Transloco [0] very much.
For Vue I use vue-i18n, I don't think there's any alternative.
For Go I like go-i18n [1] when doing SSR Go.
For Svelte.. not sure if there's a best package.
[0] https://github.com/ngneat/transloco
[1] https://github.com/nicksnyder/go-i18n</code></pre>