Google's speech-to-text is powerful, but I'd be pretty skeptical about tying a project to it given how services like Maps have been handled recently. There are companies like Mozilla trying to build more open solutions, but to the best of my knowledge (please correct me if I'm wrong) any pre-trained services Mozilla offers will also still involve you connecting to their servers.<p>Maybe I'm just paranoid, but I just can't imagine using a speech-to-text system for anything serious that I can't self-host. It feels like we've just seen example and example over and over again why this is a bad idea -- to the point that when I hear a company like Google talk about a locked-down cloud platform as "making AI accessible to everyone" it feels almost dishonest.<p>Especially once we start talking about text-to-speech. We can already do a lot of that locally - we should be pretty hesitant about coupling new text-to-speech techniques to strategies that require us to move logic away from local devices onto the cloud.
If you want to build open-source, 100% on-device and private-by-design Voice assistants which can run on a Raspberry Pi, you can take a look at what we are building at <a href="https://snips.ai" rel="nofollow">https://snips.ai</a> (disclaimer: I'm a co-founder)<p>We want to make it possible to have embedded assistants in all your objects which preserve people privacy, and do this with open-source: <a href="https://medium.com/snips-ai/an-introduction-to-snips-nlu-the-open-source-library-behind-snips-embedded-voice-platform-b12b1a60a41a" rel="nofollow">https://medium.com/snips-ai/an-introduction-to-snips-nlu-the...</a><p>Take a look at our blog to get started in 1h: <a href="https://medium.com/snips-ai/voice-controlled-lights-with-a-raspberry-pi-and-snips-822e53d7ede6" rel="nofollow">https://medium.com/snips-ai/voice-controlled-lights-with-a-r...</a><p>It also binds in popular Home automation platforms like Home Assistant and the Jeedom platform
Anyone know how this relates to the Web Speech API[1]?<p>Will they ship it with chrome to replace the existing speech synthesis api? (I believe right now it just uses whatever voices are available to the device or OS but chrome can fallback to a serverside voice)<p>[1] <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_...</a><p>[2] <a href="https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesis" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynth...</a>
On my machine the demo page doesn't work at <a href="https://cloud.google.com/text-to-speech/" rel="nofollow">https://cloud.google.com/text-to-speech/</a><p>I tried to get Google to fix this a long time ago and it seemed to work for a while after being offline for weeks.
A friend is working for a newspaper. He records interviews.<p>We tried all the software we could find to turn the recording (Dutch) into text but there is nothing that gives a helpful result.<p>I know that a recording-to-text is different than speech-to-text but even when I use OK Google most of the time the results are horrible.<p>So after all those years I am still a little skeptical.
I really want a free software implementation of Text-To-Speech and Speech-To-Text that runs on local computer without network.<p>I don't trust those cloud-based solutions.
I'd love to read what this page has to say, but somehow it managed to load with some click-grabbing Gawker-type theme? Half expecting a "100 Surprising Cloud Facts, And Number 12 Will Shock You" link to appear in that inexcusable waste of space along the bottom. <a href="https://i.imgur.com/Uk1udNo.jpg" rel="nofollow">https://i.imgur.com/Uk1udNo.jpg</a>