Indigenous engineers are using AI to preserve their culture

174 pointsby kiyanwang3 months ago

14 comments

antics3 months ago

I am one of these people! I am one of a handful of people who speak my ancestral language, Kiksht. I am lucky to be uniquely well-suited to this work, as I am (as far as I know) the lone person from my tribe whose academic research background is in linguistics, NLP, and ML. (We have, e.g., linguists, but very few computational linguists.)So far I have not had that much luck getting the models to learn the Kiksht grammar and morphology via in-context learning, I think the model will have to be trained on the corpus to actually work for it. I think this mostly makes sense, since they have functionally nothing in common with western languages.To illustrate the point a bit: the bulk of training data is still English, and in English, the semantics of a sentence are mainly derived from the specific order in which the words appear, mostly because it lost its cases some centuries ago. Its morphology is mainly "derivational" and mainly suffixal, meaning that words can be arbitrarily complicated by adding suffixes to them. So baked into English is word order that sometimes we insert words into sentences simply to make the word order sensible. e.g., when we say "it's raining outside", the "it's" refers to nothing at all—it is there entirely because the word order of English demands that it exists.Kiksht in contrast is completely different. Its semantics are nearly entirely derived from triple-prefixal structure of (in particular) verbs. Word ordering almost does not matter. There are, like, 12 tenses, and some of them require both a prefix and a reflective suffix. Verbs are often 1 or 2 characters, and with the prefix structure, a single verb can often be a complete sentence. And so on.I will continue working on this because I think it will eventually be of help. But right now the deep learning that has been most helpful to me has been to do things like computational typology. For example, discovering the "vowel inventory" of a language is shockingly hard. Languages have somewhat consistent consonants, but discovering all the varieties of `a` that one can say in a language is very hard, and deep learning is strangely good at it.

评论 #43009017 未加载

评论 #43011769 未加载

评论 #43012354 未加载

评论 #43008767 未加载

评论 #43008893 未加载

评论 #43026164 未加载

评论 #43013169 未加载

评论 #43012587 未加载

评论 #43008751 未加载

thomasfromcdnjs3 months ago

I'm an Australian indigenous and have been slowly working on this problem in my own way for a few years.<a href="https://github.com/australia/mobtranslate.com/">https://github.com/australia/mobtranslate.com/</a>In it's current iteration the homepage is just running dictionaries through OpenAI. (my tribes dictionary fits in a 100k context window)My old ambitions can be found somewhat here -> <a href="https://github.com/australia/mobtranslate-server">https://github.com/australia/mobtranslate-server</a>That being said, the OpenAI models do a fantastic job at translating sentences so I've put my own model research further to the back. (will try to find some examples)I can't speak of the true preservation, not many native speakers left, but in my mind that's not even all that important from a personal/cultural perspective.If the youth who are interested in learning more about their language have a nice interface with 70-80% accurate results and they enjoy doing/learning it then that is a win to me. (and kind of how language evolves anyway) (the noun replacement seems to work great, but grammar is obviously wishy-washy)(At this point, I just rushed to get my tribes dictionary crawlable so hopefully it will be in a few models next training phases)

评论 #43009766 未加载

AnotherGoodName3 months ago

Fwiw this is the original usage of llms. The whole context awareness came about for the purpose of translations and the famous ‘attention is all you need’ paper was from the google translation team.Say what you will about llms being overhyped but this is the original core use case.

sdsd3 months ago

I've always resented the way "traditional" cultures are "preserved". It's like when people want to protect "authentic" locales from the corrupting influence of tourism. I'm grateful that my own culture is not (yet) seen primarily through this lens. Imagine the day when legislation makes exceptions for fentanyl as a traditional ritual substance of nomadic Trailer Americans.To me the beauty of these things is in their liveliness, in the aspiration to flourish and grow, not merely to conserve a little longer, to spend one more night with terminal cancer before the inevitable.

throwaway9705983 months ago

It seems like this could be incredibly fraught with danger once LLMs are involved (the story isn't exactly clear whether they are). If there are no surviving native speakers of a language (or very few) doing something like training an LLM to generate text in that language would run the risk of e.g. transferring English grammar to the vocabulary, and causing a hybrid language to become the dominant form of that language, because the LLM is used widely for study and the native-speaking elders are not.

评论 #43013723 未加载

joshdavham3 months ago

This is a fantastic use case for LLM’s. Also, godspeed to these researchers! There’s unfortunately not a lot of time left for many of these languages.

romaaeterna3 months ago

"His dream is to revive dying languages..."The content of modern culture is too much for dying or ancient languages, and what you actually get is English/modern thoughtspace expressed in the lexicon of an until-now separate culture. This flood of spam destroys what was unique and interesting about the culture, and "skin-suits" it.

评论 #43015570 未加载

userbinator3 months ago

s/preserve/hallucinate/The next few decades are going to be really, really weird.

评论 #43008336 未加载

评论 #43009158 未加载

评论 #43008023 未加载

deadbabe3 months ago

Finally, a good use case for LLMs that isn’t just trying to anthropomorphize some already solved automation problem.

评论 #43012392 未加载

Mengkudulangsat3 months ago

On an side note, can anyone recommend an AI tool I can use to learn a random niche language as a hobby (e.g. Toki Pona)?

评论 #43011948 未加载

tho23i4234343 months ago

I wonder how useful this really is.No doubt, it's excellent for archiving, but that's not the same as "preserving" culture. If it's not alive and kicking it's not a culture IMO. You see this happen even with texts : once things start being written down, the actual knowledge tends to get lost (see India for example).This "AI to help low-resource languages" thing is a big deal in India too, but it just feels like another "jumla" for academics/techbros to make money. I mean, India has brutal/vicious policies that are out to destroy any and every language that's not English (since it's automatically a threat to central rule from Delhi), but pretty much no intellectual, either in India or the US, actually cares about the mass-wiping out of Indian languages by English... Not even the ones who go "ree bad British man destroyed India" on twitter all day.

评论 #43007952 未加载

评论 #43009594 未加载

评论 #43007665 未加载

评论 #43008622 未加载

评论 #43012857 未加载

tomp3 months ago

Terrible title. Every engineer is indigenous somewhere.

评论 #43015445 未加载

iamnotsure3 months ago

me too

reportgunner3 months ago

I don't understand, how can they use AI to preserve their culture when AI was never a part of it ?

评论 #43012738 未加载