科技回声

6 条评论

Someone7 个月前

FTA: “One major issue when programming these devices in C is that every function call consumes RAM for the return stack and function parameters. This is unavoidable”It’s not completely unavoidable: don’t use function parameters (globals are your friends on these CPUs). You can’t avoid having a return stack, but you can make as few function calls as possible (ideally zero, but you may have to write functions to fit things into ROM)> *”To solve this, I flattened the inference code”I think that’s “make as few function calls as possible”> and implemented the inner loop in assembly to optimize variable usage.That _should_ only make a difference for memory usage if your C compiler isn’t perfect (but of course, it never is, certainly on CPUs like this one, which is a poor fit for C)

评论 #41896253 未加载

评论 #41897172 未加载

评论 #41895876 未加载

评论 #41898306 未加载

Lerc7 个月前

I feel like to really get to the level of hypothetically useful it should be able to take the samples from an input source.I wonder if you could do it on the full 28*28 by never holding the full image in memory at once, just as an input stream. say a 1d convolution on each line as it comes in to turn a [1,28] to [3,7] buffer two lines of the [3,7] = 42. Then after there are three results of the third line convolution are produced [3,3]=9, start performing a 2d convolution using the first two lines [2,3,:3] replacing the data at the start (as it has already been processed).

评论 #41893420 未加载

malwrar7 个月前

Super interesting!I wish tfa would have found some way to measure the PMS150C implementation the headline brags about, but even the PFS154 (2x mem, 3x price) version is super neat! Interesting to see how the net in particular is built at such small scale. I also wish they included numbers about performance like they do in their linked CH32V003 post. I'm wondering how quick these MCUs are compared to each other and e.g. OP's PC, and how hot they get under sustained load.

评论 #41893406 未加载

pjmlp7 个月前

As proof of concept, it is quite cool.However for going into production with something like this, maybe writing everything in Assembly, and not just some parts, would be much better.But after a quick search it seems the macro assembler story for RISC-V isn't that great.

评论 #41896587 未加载

magicalhippo7 个月前

Fun to see neural nets pushed to such extremes, really enjoyed the post.> The smallest models had to be trained without data augmentation, as they would not converge otherwise.Was this also the case for the 2-bit model you ended up with?

评论 #41893238 未加载

amelius7 个月前

This challenges only the memory of the MCU, not the speed.And it is a bit disappointing that they didn't finish the project by adding a 8x8 pixel camera and a 7-segment display.

评论 #41895503 未加载

6 条评论

Someone7 个月前

评论 #41896253 未加载

评论 #41897172 未加载

评论 #41895876 未加载

评论 #41898306 未加载

Lerc7 个月前

评论 #41893420 未加载

malwrar7 个月前

评论 #41893406 未加载

pjmlp7 个月前

评论 #41896587 未加载

magicalhippo7 个月前

评论 #41893238 未加载

amelius7 个月前

This challenges only the memory of the MCU, not the speed.And it is a bit disappointing that they didn't finish the project by adding a 8x8 pixel camera and a 7-segment display.

评论 #41895503 未加载

Implementing neural networks on the "3 cent" 8-bit microcontroller

6 条评论

Implementing neural networks on the "3 cent" 8-bit microcontroller

6 条评论