Since months I am working on an open source localization solution that tackles both developer and translator facing problems. Treating translations as code completely leaves out translators, who in most cases can not code.<p>I am working on making localization effortless via dev tools and a dedicated editor for translators. Both pillars have one common denominator: translations as data in source code. Treating translations as code would break that denominator and prevent a coherent end-to-end solution.<p>Take a look at the repository <a href="https://github.com/inlang/inlang" rel="nofollow">https://github.com/inlang/inlang</a>. The IDE extension already solves type safety, inline annotations, and (partially) extraction of hardcoded strings.
The problem I see with this is that every language would need to replicate the code & logic.<p>With data / config, the translations are recorded in one place and all consumers can get the update without code changes.<p>The big thing I've been wondering / looking for is a shared, open source translation database. Anyone have links?
"You tasked me with translating this scene, so since you gave me a general programming language I used a buffer overflow to break out into the animation engine and animate your characters to use sign language."<p>Jokes aside I don't hate the idea and is actually quite positive to writing translation in code. I am a bit questioning of why you would need a new language for it though, why not use an existing programming language?<p>As others pointed out here the biggest downside I can see is that it would be harder to outsource.
Caveats:<p>- Community provided translations are now a remote code execution vector, and can steal your passwords instead of merely displaying rude words. You should now audit all translations up front before manually merging, instead of merely, say, locking down a writeable-by-default wiki after your first abuse occurs.<p>- Translation code is unlikely to be given a nice stable semvered sandboxed API boundary. Less of an issue for in-house translation where translators are working against the same branch as everyone else, more of an issue when outsourcing translation - when you get a dump of translations weeks/months later referencing refactored APIs, some poor fellow will need to handle the integration manually.<p>- Hot reloading and error recovery is likely an afterthought at best, for similar reasons. Translation typos are now likely to break your entire build, not just individual translation strings.<p>- Translators must now reproduce your code's build environment to preview translations.<p>(Code-based translations may still make sense for some projects/organizations despite these drawbacks, but these are some of the reasons dedicated translation DSLs encoded as "data" can make sense for other projects/organizations)
Having worked in the localization space over a decade ago when gettext was still the industry standard I was pleased recently to use Fluent which I think is a better more modern approach:<p><a href="https://projectfluent.org/" rel="nofollow">https://projectfluent.org/</a><p>Worked well for my use case but still needs more progress to be fully featured across all supported programming languages, for example, i found some more advanced features missing in the Rust implementation. Really worth checking out.
It's a neat idea but by intermixing code, presentation, and data you're going to run into a bunch of issues that the "traditional" approach avoids.<p>For one thing, we get our translations by handing a yaml file to external contractors. They don't need to squint at a file full of code to distinguish the bits of english that need translating from the bits that don't – they just have to translate the right side of every key, and there's specialized tooling to help them with this.<p>And for another, even in your toy example in the readme you've now lost a Single Source of Truth for certain presentation decisions. So now when some stakeholder comes to you and says they hate the italicization in the intro paragraph and to lose it ASAP, instead of taking the markup out of a common template that different data gets inserted into, you have to edit each language's version of the code to remove the markup (with all of the attendant ease of making errors that comes along when you lack a SPOT – easy to miss one language, etc). I'd expect these kinds of multiplication-of-edit problems to grow increasingly complex when you scale this approach beyond toy examples.<p>Basically this seems really hard to scale to large products, and doesn't play well with division of labour.
The localization library I use supports most of this. Not all, it's not a general purpose programming language of course, but it supports variables and conditionals, which is basically enough to do almost anything.<p><a href="https://formatjs.io/docs/react-intl/api#message-syntax" rel="nofollow">https://formatjs.io/docs/react-intl/api#message-syntax</a>
I'm not quite sure I agree with the title. Having access to code when you need it is probably a good thing.<p>But I think code is, in general, something to be avoided when declarative approaches are available.<p>Declarative is easier for a computer to understand, it restricts the inputs to one domain the computer can deal with.<p>You don't get the same classes of bugs with declarative. You could even do things like double checking with machine translation and flagging anything that doesn't match for human review.<p>Plus, you don't need a programmer to do it. Security issues go away. You often achieve very good reuse with code only existing in one place without language variants.<p>I'm sure there are great uses for this, but I have trouble thinking of even a single case where I'd prefer code to data in general.
The idea is appealing, I think, because it feels like a step toward what is surely the ultimate goal: flawless natural language generation from some semantic encoding. If you squint, these functions and arguments are the semantic encoding, and their implementations are doing their best to imitate the NLG for an extremely limited domain.<p>Of course, the problem is that implementations like this are actually stepping <i>away</i> from the very good NLG system we already have: human translators, who typically aren't coders. And the need for NLG hasn't gone away -- someone still has to hardcode these (parameterized) strings.
I worked with localizations and the main issue were that the translators didn’t code, so we had to keep the localizations separate from the code as the translators had no idea how to deal with it.
Another issue we had was that not all languages reads left-to-right but sometimes right-to-left or up-down. And sometimes formatting in one language makes sense but in some other the formatting didn’t make sense.
Languages don’t follow a main pattern, which sometimes makes it hard to automate.
We tried google translate but it kept translating things into garbage so we couldn’t use that.
This idea of localization as code has significant history from Perl: <a href="https://perldoc.perl.org/Locale::Maketext::TPJ13" rel="nofollow">https://perldoc.perl.org/Locale::Maketext::TPJ13</a><p>Currently Mozilla Fluent seems like a good compromise implementation. The type checking is maybe not as advanced, but it is intended to be compatible with the tools most often used in localization to enable translators to handle all the data and organize the task. Very straightforward getting generated localized strings to agree in number, tense, gender, and so on.
Making localized web apps is such a pain and too often an afterthought. But what if it took almost no extra effort to make the app localized from the start?<p>What if you could get static type checking, key documentation and code completion right in VS Code?<p>And what if the translations could be generated using an actual programming language, and even represent HTML markup and not just plain strings?
This is basically what I would do with exposing Velocity templates to translation users. Technically it's coding but the scope is limited to text rendering.