2 pointsby danielcampos93over 3 years ago

1 comment

combining multiple sparsification methods to improve the Hugging Face BERT base model (uncased) performance on CPUs. We combine distillation with both unstructured pruning and structured layer dropping. This “compound sparsification” technique enables up to 14x faster and 4.1x smaller BERT on CPUs depending on accuracy constraints.

Faster and smaller Hugging Face BERT on CPU via compound sparsification

1 comment

Faster and smaller Hugging Face BERT on CPU via compound sparsification

1 comment