Abstract: This paper addresses the significant challenge of executing inference tasks involving General Matrix Multiplication (GEMM) in deep neural networks(DNN) on resource-constrained edge systems.
Abstract: The exponentially growing model size drives the continued success of deep learning, but it brings prohibitive computation and memory cost. From the algorithm perspective, model ...