Abstract: Efficient representation of sparse matrices is critical for reducing memory usage and improving performance in hardware-accelerated computing systems. This letter presents memory-efficient ...
Abstract: We consider the distributed memory parallel multiplication of a sparse matrix by a dense matrix (SpMM). The dense matrix is often a collection of dense vectors. Standard implementations will ...
Sparse general matrix-matrix multiplication (SpGEMM) is fundamental to numerous scientific applications. Traditional hash-based approaches fail to strike a trade-off between reducing hash collisions ...
Since our sparse attention is implemented by FlexAttention, we recommend conducting a warm-up inference first, as subsequent inferences will perform better in terms of speed. To better demonstrate the ...