科技回声

12 条评论

This is nice, and I've wanted this kind of thing repeatedly over the last 5 years! I think you often want to run little bits of CPU-based code in addition to your deep learning graph. So I think a better deployment model might be basically Lambda but with CUDA access... or something like that.The factors that I think would make this service most valuable are low cost (think, lower than GPU's on AWS or similar, even at scale), high burst capability from cold start (1000QPS is a good target), and of course low cold start delays (< 1s, or .5s).This led me down a rabbit hole in years past and the technical solution seems to be generally, the ability to swap models in and out of GPU ram very quickly. Possibly using NVIDIA's unified memory subsystem.

评论 #27817825 未加载

37ef_ced3将近 4 年前

Or, do your inference using an AVX-512 CPU:<a href="https://NN-512.com" rel="nofollow">https://NN-512.com</a> (open source, free software, no dependencies)With batch size 1, NN-512 is easily 2x faster than TensorFlow and does 27 ResNet50 inferences per second on a c5.xlarge instance. For more unusual networks, like DenseNet or ResNeXt, the performance gap is wider.Even if you allow TensorFlow to use a larger ResNet50 batch size, NN-512 is easily 1.3x faster.If you need a few dozen inferences per second per server, this is the cheapest way. And you're not depending on a proprietary solution whose parent company could go out of business in a year.If you need Transformers instead of convolutions, Fabrice Bellard's LibNC is a good solution: <a href="https://bellard.org/libnc/" rel="nofollow">https://bellard.org/libnc/</a>

评论 #27822377 未加载

ackbar03将近 4 年前

So is this mainly focused on deployment for applications with high-speed inference requirements? I didn't dive into product in detail. I run my own deep-learning based web-app and inference speed optimization is pretty non-trivial. As far as I know production level speed requirements require use of tensorrt which is definitely not hot-start and requires more than a few minutes to load (i'm not too sure what's going on under the hood, not an expert) but has inference speeds of up to x2 or more, so not quite sure what your targeting or if you've actually managed to solve that problem which would be highly impressive

评论 #27819065 未加载

评论 #27822419 未加载

johndough将近 4 年前

> Guaranteed < 200ms response timeThis sounds confusing to me. Surely it is possible to craft a neural network that takes longer to process?> Max. model size: X GBDo you really mean model size or should this also include the size of the intermediate tensors?The full screen option on the YouTube video is turned off by the way, so it is impossible to read without leaving your website.Overall, this offer looks quite competitive. Are you planning to offer your service in the EU in the future?

评论 #27817878 未加载

rootdevelop将近 4 年前

What are the specs of an Nvidia m80?I’ve never heard of that type before and I wasn’t able to find anything with google.Furthermore more, the lack of company information (address, company registration nr etc) and the fact that it’s not clear where the servers are located geographically makes me a bit hesitant.

评论 #27820974 未加载

sjnair96将近 4 年前

Looks awesome. Do you know if and how you guys support NVIDIA's software. For my project the NVIDIA software I'm using states it needs:CUDA 11.3.0cuBLAS 11.5.1.101cuDNN 8.2.0.41NCCL 2.9.6TensorRT 7.2.3.4Triton Inference Server 2.9.0I'm new to deploying to production inference so I'm not sure if those are easily portable across such platforms or not really.

评论 #27840280 未加载

spullara将近 4 年前

Does it need to reinitialize for each request or is there a warm start / cold start model like lambda? I don't really understand how you can charge per request.

评论 #27817810 未加载

评论 #27817746 未加载

评论 #27817711 未加载

nextaccountic将近 4 年前

Looking at the examples in the landing page.. so I don't need any kind of authentication to do inference? Anyone can run the models I upload?

评论 #27822429 未加载

gigatexal将近 4 年前

Looks amazing! A 3 line getting started animation? Sold. That’s all I need to see. Very good work folks.

derekhsu将近 4 年前

May I deploy multiple models in the same billing account?

评论 #27817872 未加载

manceraio将近 4 年前

could I run spleeter on it?

评论 #27817748 未加载

inshadows将近 4 年前

White screen without JS

12 条评论

etaioinshrdlu将近 4 年前

评论 #27817825 未加载

37ef_ced3将近 4 年前

评论 #27822377 未加载

ackbar03将近 4 年前

评论 #27819065 未加载

评论 #27822419 未加载

johndough将近 4 年前

评论 #27817878 未加载

rootdevelop将近 4 年前

评论 #27820974 未加载

sjnair96将近 4 年前

评论 #27840280 未加载

spullara将近 4 年前

Does it need to reinitialize for each request or is there a warm start / cold start model like lambda? I don't really understand how you can charge per request.

评论 #27817810 未加载

评论 #27817746 未加载

评论 #27817711 未加载

nextaccountic将近 4 年前

Looking at the examples in the landing page.. so I don't need any kind of authentication to do inference? Anyone can run the models I upload?

评论 #27822429 未加载

gigatexal将近 4 年前

Looks amazing! A 3 line getting started animation? Sold. That’s all I need to see. Very good work folks.

derekhsu将近 4 年前

May I deploy multiple models in the same billing account?

评论 #27817872 未加载

manceraio将近 4 年前

could I run spleeter on it?

评论 #27817748 未加载

inshadows将近 4 年前

White screen without JS

Show HN: GPU-Accelerated Inference Hosting

12 条评论

Show HN: GPU-Accelerated Inference Hosting

12 条评论