Hey there, great work. I used Cortex as an inspiration when designing chantilly: <a href="https://github.com/creme-ml/chantilly" rel="nofollow">https://github.com/creme-ml/chantilly</a>, which is a much less ambitious solution tailored towards online machine learning models. Keep up the good work.
This certainly looks like a cleaner way to deploy an ML model than SageMaker. Couple of questions:<p>* Is this really for more intensive model inference applications that need a cluster? It feels like for a lot of my models, a cluster is overkill.<p>* A lot of the ML deployment (Cortex, SageMaker, etc) don't see to rely on first pushing changes to version control, then deploying from there. Is there any reason for this? I can't come up for a reason why this shouldn't be the default. For example, this is how Heroku works for web apps (and this is a web app at the end of the day).
The name Cortex is in use for the scalable Prometheus storage backend: <a href="https://github.com/cortexproject/cortex" rel="nofollow">https://github.com/cortexproject/cortex</a>
One of the things that has deterred me from SageMaker is how expensive it can be for a side project. Real-time endpoints start at $40-$50 per month, which would be a bit too much for a low-budget project on the side. I love the idea of using an open-source alternative, but I noticed that all of the systems combined for Cortex would be a bit more expensive. Do you have any tips on how to keep a model deployed cheaply for a side project using Cortex? Id be fine with a little bit of latency on the first request, similar to how Heroku's free dynos work.
Why would I use this over deploying the model to a lambda function aside from lack of GPU? (not trying to be confrontational, genuinely don't know) Won't lambda functions scale as needed? How does this compare cost wise?