Is this able to support more than 50 requests per second? Are there any benchmarks on performance overhead of the underlying web server/routing that is handling the requests?
Looks interesting! How about models that require dictionaries - e.g. tf-idf to convert text into a feature vector? Does it allow for some preprocessing?
I see that you accept models up to 1 GB. It seems the inference time might be high for models of this size on CPUs. Do you use GPUs to speed up inference for deep learning models ?