Ernst, Dominik and Hager, Georg and Thies, Jonas and Wellein, Gerhard (2020) Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. In: 13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019, 120 (43). Springer. PPAM 2019, 2019-09-08 - 2019-09-11, Bialystok, Polen. doi: 10.1007/978-3-030-43229-4_43. ISBN 978-303043221-8. ISSN 0302-9743.
This is the latest version of this item.
![]() |
PDF
380kB |
Abstract
General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. Nvidia's current CUBLAS implementation delivers only a fraction of the potential performance (as given by the roofline model) in this case. We describe the challenges and key properties of an implementation that can achieve perfect performance. We further evaluate different approaches of parallelization and thread distribution, and devise a flexible, configurable mapping scheme. A code generation approach enables a simultaneously flexible and specialized implementation with autotuning. This results in perfect performance for a large range of matrix sizes in the domain of interest, and at least 2/3 of maximum performance for the rest on an Nvidia Volta GPGPU.
Item URL in elib: | https://elib.dlr.de/130199/ | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Document Type: | Conference or Workshop Item (Speech) | ||||||||||||||||||||
Title: | Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs | ||||||||||||||||||||
Authors: |
| ||||||||||||||||||||
Date: | 2020 | ||||||||||||||||||||
Journal or Publication Title: | 13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019 | ||||||||||||||||||||
Refereed publication: | Yes | ||||||||||||||||||||
Open Access: | Yes | ||||||||||||||||||||
Gold Open Access: | No | ||||||||||||||||||||
In SCOPUS: | Yes | ||||||||||||||||||||
In ISI Web of Science: | No | ||||||||||||||||||||
Volume: | 120 | ||||||||||||||||||||
DOI: | 10.1007/978-3-030-43229-4_43 | ||||||||||||||||||||
Publisher: | Springer | ||||||||||||||||||||
Series Name: | Lecture Notes in Computer Science | ||||||||||||||||||||
ISSN: | 0302-9743 | ||||||||||||||||||||
ISBN: | 978-303043221-8 | ||||||||||||||||||||
Status: | Published | ||||||||||||||||||||
Keywords: | linear algebra, high performance computing, Graphics Processing Units, memory-bounded operations | ||||||||||||||||||||
Event Title: | PPAM 2019 | ||||||||||||||||||||
Event Location: | Bialystok, Polen | ||||||||||||||||||||
Event Type: | international Conference | ||||||||||||||||||||
Event Start Date: | 8 September 2019 | ||||||||||||||||||||
Event End Date: | 11 September 2019 | ||||||||||||||||||||
HGF - Research field: | Aeronautics, Space and Transport | ||||||||||||||||||||
HGF - Program: | Space | ||||||||||||||||||||
HGF - Program Themes: | Space System Technology | ||||||||||||||||||||
DLR - Research area: | Raumfahrt | ||||||||||||||||||||
DLR - Program: | R SY - Space System Technology | ||||||||||||||||||||
DLR - Research theme (Project): | R - Vorhaben SISTEC (old) | ||||||||||||||||||||
Location: | Köln-Porz | ||||||||||||||||||||
Institutes and Institutions: | Institut of Simulation and Software Technology | ||||||||||||||||||||
Deposited By: | Thies, Jonas | ||||||||||||||||||||
Deposited On: | 21 Nov 2019 09:24 | ||||||||||||||||||||
Last Modified: | 24 Apr 2024 20:33 |
Available Versions of this Item
- Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. (deposited 21 Nov 2019 09:24) [Currently Displayed]
Repository Staff Only: item control page