(with proper formatting this time)<p>Note that it is perfectly possible to have more complicated values associated with FST keys, just not in the <a href="https://docs.rs/fst/0.3.3/fst/map/index.html" rel="nofollow">https://docs.rs/fst/0.3.3/fst/map/index.html</a> implementation. FST's <i>can also be cyclic</i> – this lets you represent things you couldn't with just a hash table.<p>-----<p>Anyone who wants to play around with this should try the HFST[1] library, which lets you create compact and possibly cyclic string-to-string maps, which are closed under union, intersection, reversal, inversion, difference, composition. HFST makes it quite easy to do different operations on FST's:<p><pre><code> $ echo 'c a t 0:%+N 0:%+Sg' | hfst-regexp2fst > cat.fst
$ echo cat |hfst-lookup -q cat.fst
cat cat+N+Sg 0,000000
</code></pre>
(the default is for each arc to have input-equals-output, but you can use : to map inputs to outputs, and use % to escape special characters; 0 is epsilon/"nothing")<p><pre><code> $ echo cats |hfst-lookup -q cat.fst
cats cats+? inf
</code></pre>
only singular, so let's make one for the plural:<p><pre><code> $ echo 'c a t 0:%+N s:%+Pl' | hfst-regexp2fst > cats.fst
$ echo cats |hfst-lookup -q cats.fst
cats cat+N+Pl 0,000000
</code></pre>
and combine them:<p><pre><code> $ hfst-union -1 cat.fst -2 cats.fst >feline.fst
$ hfst-fst2strings feline.fst
cat:cat+N+Sg
cats:cat+N+Pl
</code></pre>
and make it go from analysis to form:<p><pre><code> $ hfst-invert feline.fst > ɟǝlᴉuǝ.fst
$ echo 'cat+N+Pl' | hfst-lookup -q ɟǝlᴉuǝ.fst
cat+N+Pl cats 0,000000
$ hfst-fst2strings ɟǝlᴉuǝ.fst
cat+N+Sg:cat
cat+N+Pl:cats
</code></pre>
this is what the states look like:<p><pre><code> $ hfst-fst2txt feline.fst
0 1 c c 0.000000
0 6 @0@ @0@ 0.000000
1 2 a a 0.000000
2 3 t t 0.000000
3 4 @0@ +N 0.000000
4 5 @0@ +Sg 0.000000
5 0.000000
6 7 c c 0.000000
7 8 a a 0.000000
8 9 t t 0.000000
9 10 @0@ +N 0.000000
10 11 s +Pl 0.000000
11 0.000000
</code></pre>
there's some redundancy, so we should minimize it:<p><pre><code> $ hfst-minimize feline.fst >min.fst
$ hfst-fst2txt min.fst
0 1 c c 0.000000
1 2 a a 0.000000
2 3 t t 0.000000
3 4 @0@ +N 0.000000
4 5 @0@ +Sg 0.000000
4 5 s +Pl 0.000000
5 0.000000
</code></pre>
Now let's make an FST that turns slashes into +-es and increases the weight for every slash we see (~$[ a ] means anything-but-a):<p><pre><code> $ echo '%/:%+ ~$[ %/ ]' | hfst-regexp2fst | hfst-reweight --end-states-only --addition=1 |hfst-fst2txt
0 1 / + 0.000000
1 1 + + 0.000000
1 1 @_IDENTITY_SYMBOL_@ @_IDENTITY_SYMBOL_@ 0.000000
1 1.000000
$ echo '%/:%+ ~$[ %/ ]' | hfst-regexp2fst | hfst-reweight --end-states-only --addition=1 | hfst-repeat > dir.fst
$ echo /a | hfst-lookup -q dir.fst
/a +a 1,000000
$ echo /ab/c | hfst-lookup -q dir.fst
/ab/c +ab+c 2,000000
$ echo /ab/c/d//e | hfst-lookup -q dir.fst
/ab/c/d//e +ab+c+d++e 5,000000
</code></pre>
– that's not something you can do with (just) a hash table.<p>-----<p>On Debians you can install the package `giella-sme` which gives you a 37M cyclic FST of 587060 states and 1101943 arcs which turns North Sámi word forms into analyses. North Sámi has productive compounding, so e.g. "school bus coffee" is a word that I suppose someone might say:<p><pre><code> $ echo 'skuvlabussegáfe' | hfst-lookup -q /usr/share/giella/sme/analyser-disamb-gt-desc.hfstol |head -1
skuvlabussegáfe skuvlabusse+G3+Sem/Veh+N+Err/Orth+SgNomCmp+Cmp#gáfe+Sem/Plant+N+Sg+Nom 10,000000
</code></pre>
and there's a bit of ambiguity in the analysis:<p><pre><code> $ echo 'skuvlabussegáfe' | hfst-lookup -q /usr/share/giella/sme/analyser-disamb-gt-desc.hfstol |wc -l
64
</code></pre>
[1] <a href="https://github.com/hfst/hfst/" rel="nofollow">https://github.com/hfst/hfst/</a> , using among others <a href="http://openfst.org/twiki/bin/view/FST/WebHome" rel="nofollow">http://openfst.org/twiki/bin/view/FST/WebHome</a> under the hood. `sudo apt install hfst` on Debians
Note that it is perfectly possible to have more complicated values associated with FST keys, just not in the <a href="https://docs.rs/fst/0.3.3/fst/map/index.html" rel="nofollow">https://docs.rs/fst/0.3.3/fst/map/index.html</a> implementation. FST's can also be cyclic – this lets you represent things you couldn't with just a hash table.<p>Anyone who wants to play around with this should try the HFST[1] library, which lets you create compact and possibly cyclic string-to-string maps, which are closed under union, intersection, reversal, inversion, difference, composition. HFST makes it quite easy to do different operations on FST's:<p>$ echo 'c a t 0:%+N 0:%+Sg' | hfst-regexp2fst > cat.fst
$ echo cat |hfst-lookup -q cat.fst
cat cat+N+Sg 0,000000<p>(the default is for each arc to have input-equals-output, but you can use : to map inputs to outputs, and use % to escape special characters; 0 is epsilon/"nothing")<p>$ echo cats |hfst-lookup -q cat.fst
cats cats+? inf<p>$ echo 'c a t 0:%+N s:%+Pl' | hfst-regexp2fst > cats.fst
$ echo cats |hfst-lookup -q cats.fst
cats cat+N+Pl 0,000000<p>$ hfst-union -1 cat.fst -2 cats.fst >feline.fst
$ hfst-fst2strings feline.fst
cat:cat+N+Sg
cats:cat+N+Pl<p>$ hfst-invert feline.fst > ɟǝlᴉuǝ.fst
$ echo 'cat+N+Pl' | hfst-lookup -q ɟǝlᴉuǝ.fst
cat+N+Pl cats 0,000000
$ hfst-fst2strings ɟǝlᴉuǝ.fst
cat+N+Sg:cat
cat+N+Pl:cats<p>$ hfst-fst2txt feline.fst
0 1 c c 0.000000
0 6 @0@ @0@ 0.000000
1 2 a a 0.000000
2 3 t t 0.000000
3 4 @0@ +N 0.000000
4 5 @0@ +Sg 0.000000
5 0.000000
6 7 c c 0.000000
7 8 a a 0.000000
8 9 t t 0.000000
9 10 @0@ +N 0.000000
10 11 s +Pl 0.000000
11 0.000000<p>$ hfst-minimize feline.fst >min.fst
$ hfst-fst2txt min.fst
0 1 c c 0.000000
1 2 a a 0.000000
2 3 t t 0.000000
3 4 @0@ +N 0.000000
4 5 @0@ +Sg 0.000000
4 5 s +Pl 0.000000
5 0.000000<p>Now let's make an FST that turns slashes into +-es and increases the weight for every slash we see (~$[ a ] means anything-but-a):<p>$ echo '%/:%+ ~$[ %/ ]' | hfst-regexp2fst | hfst-reweight --end-states-only --addition=1 |hfst-fst2txt
0 1 / + 0.000000
1 1 + + 0.000000
1 1 @_IDENTITY_SYMBOL_@ @_IDENTITY_SYMBOL_@ 0.000000
1 1.000000<p>$ echo '%/:%+ ~$[ %/ ]' | hfst-regexp2fst | hfst-reweight --end-states-only --addition=1 | hfst-repeat > dir.fst
$ echo /a | hfst-lookup -q dir.fst
/a +a 1,000000<p>$ echo /ab/c | hfst-lookup -q dir.fst
/ab/c +ab+c 2,000000<p>$ echo /ab/c/d//e | hfst-lookup -q dir.fst
/ab/c/d//e +ab+c+d++e 5,000000<p>On Debians you can install the package `giella-sme` which gives you a 37M cyclic FST of 587060 states and 1101943 arcs which turns North Sámi word forms into analyses. North Sámi has productive compounding, so e.g. "school bus coffee" is a word that I suppose someone might say:<p>$ echo 'skuvlabussegáfe' | hfst-lookup -q /usr/share/giella/sme/analyser-disamb-gt-desc.hfstol |head -1
skuvlabussegáfe skuvlabusse+G3+Sem/Veh+N+Err/Orth+SgNomCmp+Cmp#gáfe+Sem/Plant+N+Sg+Nom 10,000000<p>and there's a bit of ambiguity in the analysis:<p>$ echo 'skuvlabussegáfe' | hfst-lookup -q /usr/share/giella/sme/analyser-disamb-gt-desc.hfstol |wc -l
64<p>[1] <a href="https://github.com/hfst/hfst/" rel="nofollow">https://github.com/hfst/hfst/</a> , using among others <a href="http://openfst.org/twiki/bin/view/FST/WebHome" rel="nofollow">http://openfst.org/twiki/bin/view/FST/WebHome</a> under the hood. `sudo apt install hfst` on Debians