2 pointsby wsxiaoysalmost 2 years ago

1 comment

wsxiaoysalmost 2 years ago

A less hyped inference engine with INT8/FP16 inference supports on both CPU / GPU (cuda).<p>Model supports list: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, LLAMA, T5, WHISPER<p>( Found this library during my research on alternatives to triton/FasterTransformer in Tabby <a href="https://github.com/TabbyML/tabby">https://github.com/TabbyML/tabby</a>)

CTranslate2: An efficient inference engine for Transformer models

1 comment

CTranslate2: An efficient inference engine for Transformer models

1 comment