This paper introduces JaxPruner, an open-source JAX-based pruning and sp...
We study the problem of efficient generative inference for Transformer
m...
We present a fully on-device and streaming Speech-To-Speech (STS) conver...
Sparsity has become one of the promising methods to compress and acceler...
Large language models have been shown to achieve remarkable performance
...
Reducing the latency and model size has always been a significant resear...
Quantization has become a popular technique to compress neural networks ...