Abstract: In modern machine learning models like Transformers, matrix multiplication dominates most computation. Specific hardware often uses large-scale PE arrays, such as systolic arrays, to ...