We have been working on a book that focuses on deep learning efficiency techniques such as quantization, pruning, distillation, etc. for both server-side as well as on-device (smartphones, IoT, etc.) applications. We had earlier released the first four chapters for anyone to read for free.<p>We now have a new chapter focusing on sparsity and clustering, two advanced compression techniques that you can use to reduce the footprint of your model (size, latency, etc.) while retaining your model accuracy. You can read the five chapters released so far, and go through the accompanying codelabs in the form of Jupyter notebooks.<p>We hope that our readers can make their models 4-20x smaller, faster, and better in quality. We would truly appreciate any sort of comments / feedback.<p>Book: efficientdlbook.com
Feedback: hello@efficientdlbook.com