科技回声

11 条评论

sen12 个月前

It's really cool how people are taking these tiny cheap MCUs and making them do fun things for hobbyists. There's nothing better than a project with zero real-world use-case but that's done just because it was a challenge.Eg:Making the CH32V003 programmable via USB: <a href="https://www.youtube.com/watch?v=j-QazXghkLY" rel="nofollow">https://www.youtube.com/watch?v=j-QazXghkLY</a>CH32V003 "Super-Cluster": <a href="https://www.youtube.com/watch?v=lh93FayWHqw" rel="nofollow">https://www.youtube.com/watch?v=lh93FayWHqw</a>Powering a Nixie Tube from USB with a CH32V003: <a href="https://www.youtube.com/watch?v=-4d3PgEXhdY" rel="nofollow">https://www.youtube.com/watch?v=-4d3PgEXhdY</a>(A good rule in life in general is to just always watch CNLohr and Bitluni if you're into "useless but amazing hardware projects")

评论 #40510260 未加载

评论 #40508487 未加载

jononor12 个月前

Minor nitpick/clarification. As it stands this is doing detection of a fixed, small vocabulary of words - not open ended text to speech covering entire language. Also called speech command recognition / keyword spotting. Which is already impressive and useful. General STT on this grade hardware would be an amazing feat!

kragen12 个月前

this is exciting! it's still at prototype stage: 'getting about 90% accuracy [distinguishing between the spoken digits 'zero' to 'nine',] with the code as it stands.'i wonder if modern continuous optimization algorithms could yield a neural network that would do better than this mfcc approach at, perhaps, even lower computational costthey seem to have gotten more expensive lately, though (11.83¢ in quantity 500), and lcsc is out of stock on the ch32v003. they only have in stock ch32v203 and up, which costs 37.5¢. <a href="https://www.lcsc.com/products/Microcontroller-Units-MCUs-MPUs-SOCs_11329.html?keyword=ch32v" rel="nofollow">https://www.lcsc.com/products/Microcontroller-Units-MCUs-MPU...</a>digi-key, as usual, doesn't list the part at all

评论 #40511775 未加载

评论 #40506813 未加载

评论 #40515878 未加载

jononor12 个月前

Really nice project! Great care is taken in optimized audio feature extraction, very cool to see. I am working on a very similar project[1], using the Puya PY32. I opted for that chip over CH32 since it has DMA (simplifies efficient ADC input at audio rates), and 1 kB more RAM. For a couple of cents more. I have written about some of the hardware constraints on low cost audio already, and am getting to the audio DSP/ML in the next months.1. <a href="https://hackaday.io/project/194511-1-dollar-tinyml" rel="nofollow">https://hackaday.io/project/194511-1-dollar-tinyml</a>

buescher12 个月前

I wonder how this performs compared to the "voice recognition" VCP200 chip sold by Radio Shack in the eighties (maybe early nineties?). <a href="https://21stdigitalhome.blogspot.com/2013/06/vcp200-voice-recognition-ic.html" rel="nofollow">https://21stdigitalhome.blogspot.com/2013/06/vcp200-voice-re...</a>Also be interesting to know if that Voice Control Products ever had a real design win.I gather the VCP200 was a mask-programmed M6804 microcontroller. The M6804 was a strange and obscure beast, apparently a cost-reduced, internally serial ("1-bit"), partial reimplementation of the M6805, which was one of the first Motorola 8-bit microcontrollers based on the 6800. Max bus speed of 2.75MHz, with an instruction cycle time of 44 microseconds. 32 bytes of RAM and 1K mask-programmed ROM. No ADC. <a href="http://www.bitsavers.org/components/motorola/6804/M6804_MCU_Manual_Sep85.pdf" rel="nofollow">http://www.bitsavers.org/components/motorola/6804/M6804_MCU_...</a>One should be able to do better with about any modern microcontroller. Then again, for all I know the VCP200 was not fit to even the modest tasks (looks like toy/novelty/hobbyist) it was marketed for back then.

hales12 个月前

Is there a recorded demo? Reading about speech-to-text is different from hearing it.

评论 #40506919 未加载

watersb12 个月前

About 10 years ago, I used a basic flip phone, vendor locked to a $15/month Verizon plan.The Wal Mart page for a similar device is still up at<a href="https://www.walmart.com/ip/Verizon-Wireless-Samsung-Gusto-3-128MB-Prepaid-Smartphone-Black/36771424" rel="nofollow">https://www.walmart.com/ip/Verizon-Wireless-Samsung-Gusto-3-...</a>Among other things, it had limited speech recognition -- you could say "Call" followed by a name, and it would match that against the address book on device.We live in strange times.

评论 #40511827 未加载

评论 #40508913 未加载

评论 #40508236 未加载

londons_explore12 个月前

Projects like this really open the doors to coin sized devices which can record months of audio from a tiny battery.You can imagine employers who might want a record of everything said on their premises for example.

评论 #40515852 未加载

londons_explore12 个月前

If you uploaded some training data somewhere, perhaps to some links to simulators, you might get a crowd of people code-golfing this to maximize accuracy.

countvonbalzac12 个月前

What's the minimum spec chip you will need to run the smallest whisper model (looks like that's 39M parameters)?

评论 #40515832 未加载

评论 #40517963 未加载

pcdoodle12 个月前

90% accuracy on 10 digits is pretty disappointing but cool project.

评论 #40510828 未加载

11 条评论

sen12 个月前

评论 #40510260 未加载

评论 #40508487 未加载

jononor12 个月前

kragen12 个月前

评论 #40511775 未加载

评论 #40506813 未加载

评论 #40515878 未加载

jononor12 个月前

buescher12 个月前

hales12 个月前

Is there a recorded demo? Reading about speech-to-text is different from hearing it.

评论 #40506919 未加载

watersb12 个月前

评论 #40511827 未加载

评论 #40508913 未加载

评论 #40508236 未加载

londons_explore12 个月前

评论 #40515852 未加载

londons_explore12 个月前

If you uploaded some training data somewhere, perhaps to some links to simulators, you might get a crowd of people code-golfing this to maximize accuracy.

countvonbalzac12 个月前

What's the minimum spec chip you will need to run the smallest whisper model (looks like that's 39M parameters)?

Simple Speech-to-Text on the '10 Cents' CH32V003 Microcontroller

11 条评论

Simple Speech-to-Text on the '10 Cents' CH32V003 Microcontroller

11 条评论