I've been working on Scaffoldly since 2020 to simplify AWS Lambda deployments. Recently discovered you can run Hugging Face models efficiently using EFS for caching. Here's what's interesting:<p><pre><code> - Uses EFS for model file persistence
- Pre-downloads models after deployment for faster cold starts
- Cold start: ~20s (model loading), warm requests: 5-20s (CPU inference)
- Fully automated container builds and deployment
- Works with private/gated models via HF_TOKEN
</code></pre>
Example deployment:<p><pre><code> npx scaffoldly create app --template python-huggingface
cd python-huggingface && npx scaffoldly deploy
</code></pre>
Scaffoldly is Open Source and I'm excited for all feedback and contributions from the community!<p><a href="https://github.com/scaffoldly/scaffoldly">https://github.com/scaffoldly/scaffoldly</a><p><a href="https://github.com/scaffoldly/scaffoldly-examples/tree/python-huggingface">https://github.com/scaffoldly/scaffoldly-examples/tree/pytho...</a>