My web application makes it easier to read Japanese sentences.<p>Where does a word begin and end? (there's no spaces)
How do I pronounce the word? (phonetics are missing)
How do I look up the word in the dictionary? (it's necessary to know how to type it, and how to deconjugate verbs)<p>We can overcome these gaps with good software.<p>MeCab (compiled to WebAssembly) provides morphological analysis (guesses where words start and end, and what kind of word it is)
Dictionaries are embedded, for client-side searching. As a result: there is no backend.
The application is a Progressive Web Application, so it can be saved for offline use (141MB).<p><a href="https://birchlabs.co.uk/mecab-web/" rel="nofollow">https://birchlabs.co.uk/mecab-web/</a> (Warning: 37MB webpage)<p><i></i>Technical notes:<i></i><p>There's a serious amount of dictionary included. I culled Kanjidic from 15.5MB to 0.7MB. Remaining dictionaries gzip pretty well (138MB -> 36MB).<p>Apache is configured for streaming compilation and pre-computes gzips.<p>I wanted to explore whether we actually _need_ a bundler in 2019. I used @pika/web to grab libraries as ES modules.<p>HTTP/2 + gzip used instad of bundler. Source _is_ distribution; old school. No backend, so application can be served statically from a CDN.<p>Preact/htm/unistore are used instead of React/JSX/Redux. Libraries weigh <100KB.<p>Workbox is used to generate a service-worker. Saves source code and assets so that the webpage can be saved as a PWA and used offline.
Offline dictionaries have been done before (e.g. apps), but this is a particularly small one, and perhaps the first to provide sentence tokenization via MeCab.<p>I'd love to hear your feedback, be it on language concerns, technology, or user experience.