elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs

Ernst, Dominik and Hager, Georg and Thies, Jonas and Wellein, Gerhard (2020) Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. In: 13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019, 120 (43). Springer. PPAM 2019, 2019-09-08 - 2019-09-11, Bialystok, Polen. doi: 10.1007/978-3-030-43229-4_43. ISBN 978-303043221-8. ISSN 0302-9743.

This is the latest version of this item.

[img] PDF
380kB

Abstract

General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. Nvidia's current CUBLAS implementation delivers only a fraction of the potential performance (as given by the roofline model) in this case. We describe the challenges and key properties of an implementation that can achieve perfect performance. We further evaluate different approaches of parallelization and thread distribution, and devise a flexible, configurable mapping scheme. A code generation approach enables a simultaneously flexible and specialized implementation with autotuning. This results in perfect performance for a large range of matrix sizes in the domain of interest, and at least 2/3 of maximum performance for the rest on an Nvidia Volta GPGPU.

Item URL in elib:https://elib.dlr.de/130199/
Document Type:Conference or Workshop Item (Speech)
Title:Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Ernst, DominikUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Hager, GeorgUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Thies, JonasUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Wellein, GerhardErlangen Regional Computing CenterUNSPECIFIEDUNSPECIFIED
Date:2020
Journal or Publication Title:13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In SCOPUS:Yes
In ISI Web of Science:No
Volume:120
DOI:10.1007/978-3-030-43229-4_43
Publisher:Springer
Series Name:Lecture Notes in Computer Science
ISSN:0302-9743
ISBN:978-303043221-8
Status:Published
Keywords:linear algebra, high performance computing, Graphics Processing Units, memory-bounded operations
Event Title:PPAM 2019
Event Location:Bialystok, Polen
Event Type:international Conference
Event Start Date:8 September 2019
Event End Date:11 September 2019
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Space System Technology
DLR - Research area:Raumfahrt
DLR - Program:R SY - Space System Technology
DLR - Research theme (Project):R - Vorhaben SISTEC (old)
Location: Köln-Porz
Institutes and Institutions:Institut of Simulation and Software Technology
Deposited By: Thies, Jonas
Deposited On:21 Nov 2019 09:24
Last Modified:24 Apr 2024 20:33

Available Versions of this Item

  • Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. (deposited 21 Nov 2019 09:24) [Currently Displayed]

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
OpenAIRE Validator logo electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.