It's a shame that the quality of open-source text to speech engines is so much worse than the current commercial state of the art, as that's the most notable difference in the demo videos between this and something like Siri. Would fixing that just be a matter of recording more high-quality free sample libraries, etc., or are there fundamental technical challenges to solve?
It says it's 100% open-source, but I can't seem to find the sources for the Jasper platform (not the Jasper client).<p>Even the "compile Jasper from scratch" installation method involves downloading some binaries: <a href="http://jasperproject.github.io/documentation/software/#install-binaries" rel="nofollow">http://jasperproject.github.io/documentation/software/#insta...</a><p>edit: more specific link
I actually made this exact thing Saturday afternoon. Great job!<p>First prototype was on an Arduino it and eventually ran out of firmware space. So then I upgraded to the RPi and from there it was a breeze.<p>Suggestions:<p>(1) Use Wit.ai for NLP. There is some added latency but the capabilities far out reach Sphinx in the long run. It's free. Less code to maintain. Easier to deploy and distribute.<p>(2) Try to find a small mic so that you can put everything in a sleek package.<p>(3) Add support for bluetooth speakers (you're on a RPi, it's basically done for you)<p>(4) 3D print a custom case, throw some 3M tape on it and it's ready to be wall-mounted!
Nice project--I like the code, it's very clean and well structured. You should check out the subversion trunk of pocketsphinx, it has support for keyword spotting built in so you can do things like instantly recognize the persona keyword to enable the system instead of running the transcription through pocketsphinx and hoping for the best.<p>Unfortunately the keyword spotting stuff isn't documented yet, but check out the code for my Demolition Man swear detector project which is using it: <a href="http://hackaday.io/project/531-Demolition-Man-Verbal-Morality-Statute-Monitor" rel="nofollow">http://hackaday.io/project/531-Demolition-Man-Verbal-Moralit...</a> The important bit is the ps_set_kws function call, this takes either a text keyword or filename with list of keywords. Then after processing audio call ps_get_hyp and it will return any spotted keywords. Check out the code here in PocketSphinxKWS.h/cpp: <a href="https://github.com/tdicola/DemoManMonitor" rel="nofollow">https://github.com/tdicola/DemoManMonitor</a>
I am looking to use this / something similar at the startup I work at to toggle our robot's operating modes. I have a question regarding the voice recognition. Is the voice recognition stuff done on the Pi itself, or is there a service that Jasper taps into to perform voice recognition?
That's amazing, I'm impressed. I have no idea how voice recognition software works, but is it possible using this open source project to add other non-Latin languages (e.g. Greek, Arabic, Japanese)?
I'm not entirely clear on why it requires a WiFi adapter? Can you not use the wired connection? Module writing looks pretty nifty though, can't wait to give it a try over the weekend.
Guys, could you tell me where did you get the music for your demo? How much did you pay? What was the process? Did you use something like MoveMaker?<p>I'm building a tool (prototype phase) for creating trail videos. One of the use case would be to create the demo movie of a product.
This looks pretty good.<p>I've been working on a freetext question answering service, but more the question answering part (as opposed to the voice recognition side).<p>Looking at the documentation[1], it appears there is no way for it to handle free text questions ("What is population of X?" - where X is any country) since all words need to be defined in advance. Is that correct or am I missing something?<p>[1] <a href="http://jasperproject.github.io/documentation/api/standard/" rel="nofollow">http://jasperproject.github.io/documentation/api/standard/</a>
This looks great! I'm only missing a USB microphone. I'll be sure to make this once I get my hands on one.<p>It also seems pretty trivial to set up Wolfram Alpha on here. From what it looks like, you'd just have to:
1) get a developer account at Wolfram Alpha
2) download this promising looking module: <a href="https://pypi.python.org/pypi/wolframalpha/1.0.2" rel="nofollow">https://pypi.python.org/pypi/wolframalpha/1.0.2</a>
3) integrate it into Jasper (create a module)<p>I'll be sure to try it once I get it set up.
Serious question: Why does everyone seem to confuse speech recognition with other parts of NLP (e.g. parsing)?<p>I can understand CNN or TechCrunch getting confused, but there seems to be a universal confusion here on HN too.<p>Not ranting. It is a bit exasperating to read comments and articles addressing only speech recognition. Siri is more than that.
Excellent work, I've always wanted to see a real world use of pocketsphinx with python. When I looked a year ago the documentation was lacking. The module system looks nicely extensible as well.
I had a very similar (albeit less-complete) hack a while back: <a href="https://github.com/rob-mccann/Pi-Voice" rel="nofollow">https://github.com/rob-mccann/Pi-Voice</a>
Is there a way to make this subtract any noise being outputted by the pi's speaker ouput, so that it can still understand me if I'm playing music/watching a movie?
It would be cool to trigger commands with my smartphone.
I say "Open the door" he sends it to my Raspberry who then opens my door.
Is this possible?
How accurate is text recognition (could you use it for dictation?) and how fast between the end of a command and recognition/parsing of said command?
I think this is a great project! But Raspberry Pi might be a bit overpriced/overpowered for this task. Maybe something like Arduino Yún would be more appropriate choice. I am really hoping this movement of small GNU/Linux based home appliances will take off and lower the price.