2 点作者 wsxiaoys大约 2 年前

1 comment

wsxiaoys大约 2 年前

A less hyped inference engine with INT8/FP16 inference supports on both CPU / GPU (cuda).<p>Model supports list: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, LLAMA, T5, WHISPER<p>( Found this library during my research on alternatives to triton/FasterTransformer in Tabby <a href="https://github.com/TabbyML/tabby">https://github.com/TabbyML/tabby</a>)

CTranslate2: An efficient inference engine for Transformer models

1 comment

CTranslate2: An efficient inference engine for Transformer models

1 comment