I'll ask plainly what others are hinting at : Is this actually your own built service, or are you a proxy for something like Google Translate API[1]?<p>If it's your own built service, it's critical how you explain the hows and whys of your forecast availability and scalability numbers for your chosen architecture, given who you are competing with.<p>[1]<a href="https://developers.google.com/translate/v2/using_rest#detect-language" rel="nofollow">https://developers.google.com/translate/v2/using_rest#detect...</a>
Alternatively, people can just download langid.py[1] and do language detection locally. This is not a particularly hard problem - I think it's doable by undergrad ML or NLP classes.<p>The tricky parts are usually political - are users going to be angry if you confuse Indonesian with Malaysian, or so on?<p>[1] <a href="https://github.com/saffsd/langid.py" rel="nofollow">https://github.com/saffsd/langid.py</a>
The design is fine, but the language used on the page itself isn't quite right.<p>I see three spelling errors in your language list:<p>- Panjabi should be Punjabi;<p>- Teligu should be Telugu;<p>- Ukraininan should be Ukrainian.<p>There are also a few grammar problems earlier in the document, and style problems (e.g. English doesn't use a space before sentence-ending punctuation marks).
Hmm, it takes 5+ seconds to get a response, and it chokes on the same test phrase as Google, thinking "Ik hou van vette lettertypes." is Norwegian...
Looks interesting. Why not have a input on the landing page where someone can try it out without even signing up? I think then people could give it a spin before they give away their email address. Otherwise, the user just has to trust your 99% figure, which it might be helpful to give some data around, even if it is a footnote (on a corpus of x, over x period of time, etc.)<p>Also, I think it would be clearer if it said "A simple and scalable way to automatically classify text by language" instead of "A simple and scalable way to classify automatically text by language".<p>Design looks very clean though. Nice work.<p>EDIT: Also, your social media links at the bottom aren't hooked up yet.
Also, for those who would like to know how you can implement a language guesser (sources + link to paper):<p><a href="http://www.let.rug.nl/vannoord/TextCat/" rel="nofollow">http://www.let.rug.nl/vannoord/TextCat/</a><p>Python version:<p><a href="http://thomas.mangin.com/data/source/ngram.py" rel="nofollow">http://thomas.mangin.com/data/source/ngram.py</a><p>It's something that is fun to implement and doesn't take more than a few hours at most.
You should also consider full-non-ambiguous words before trying with trigrams. "marché" is only available in French, whereas "mar", "arc", ... are available in lots of languages. This should drastically improve your results.
I've used detectlanguage.com[1] in the past, which seems like a very similar service to getlang.io. With both of them it is hard to know what is behind the scenes...<p>[1] <a href="http://detectlanguage.com/" rel="nofollow">http://detectlanguage.com/</a>
And it looks like that they are using the following library: <a href="http://code.google.com/p/language-detection/" rel="nofollow">http://code.google.com/p/language-detection/</a> - at least the number & list of languages is very similar :-)
I wonder how this performs on short text posts like tweets. At my last gig where we did social media text analysis we used a few different packages (chromium, guess-language, and our own ngram classifier) and still had pretty low accuracy for tweets.
You guys might want to handle GET requests for /try URL(<a href="https://getlang.io/try" rel="nofollow">https://getlang.io/try</a>) as well.currently it's returning "Server Error (500)" for GET requests.
Matthew Kirk spoke about a neural network language predictor at RubyConf a few weeks ago. Here are his slides and code: <a href="http://modulus7.com/rubyconf/" rel="nofollow">http://modulus7.com/rubyconf/</a>
I don't know why I can't stand this sentence "A simple and scalable way to classify automatically text by language". "Classify" and "automatically" need to switch places.
Apache Tika (<a href="http://tika.apache.org/" rel="nofollow">http://tika.apache.org/</a>) also has language detector, although it maybe not so good as CLD...
If I were to implement this I'd rather use google's prediction api. At least with that you get a bit of control over what goes into the training data.
It's Telugu not Teligu.
By Panjabi, do you mean Punjabi?<p>As others already mentioned, it would be good to have users try examples before signup.
how does this compare in accuracy to chromium's Compact Language Detector?<p><a href="https://code.google.com/p/chromium-compact-language-detector/" rel="nofollow">https://code.google.com/p/chromium-compact-language-detector...</a><p><a href="https://github.com/mzsanford/cld" rel="nofollow">https://github.com/mzsanford/cld</a>