elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs

Ernst, Dominik and Hager, Georg and Thies, Jonas and Wellein, Gerhard (2020) Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. In: Lecture Notes in Computer Science. Springer. PPAM 2019, 8.-11. Sept. 2019, Bialystok, Polen.

[img] PDF
380kB

Abstract

General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. Nvidia's current CUBLAS implementation delivers only a fraction of the potential performance (as given by the roofline model) in this case. We describe the challenges and key properties of an implementation that can achieve perfect performance. We further evaluate different approaches of parallelization and thread distribution, and devise a flexible, configurable mapping scheme. A code generation approach enables a simultaneously flexible and specialized implementation with autotuning. This results in perfect performance for a large range of matrix sizes in the domain of interest, and at least 2/3 of maximum performance for the rest on an Nvidia Volta GPGPU.

Item URL in elib:https://elib.dlr.de/130199/
Document Type:Conference or Workshop Item (Speech)
Title:Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs
Authors:
AuthorsInstitution or Email of AuthorsAuthors ORCID iD
Ernst, DominikDominik.Ernst (at) fau.deUNSPECIFIED
Hager, GeorgGeorg.Hager (at) fau.deUNSPECIFIED
Thies, JonasJonas.Thies (at) dlr.deUNSPECIFIED
Wellein, GerhardErlangen Regional Computing CenterUNSPECIFIED
Date:2020
Journal or Publication Title:Lecture Notes in Computer Science
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:Yes
Publisher:Springer
Status:Accepted
Keywords:linear algebra, high performance computing, Graphics Processing Units, memory-bounded operations
Event Title:PPAM 2019
Event Location:Bialystok, Polen
Event Type:international Conference
Event Dates:8.-11. Sept. 2019
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Space Technology
DLR - Research area:Raumfahrt
DLR - Program:R SY - Technik für Raumfahrtsysteme
DLR - Research theme (Project):R - Vorhaben SISTEC
Location: Köln-Porz
Institutes and Institutions:Institut of Simulation and Software Technology
Deposited By: Thies, Jonas
Deposited On:21 Nov 2019 09:24
Last Modified:21 Nov 2019 09:24

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Copyright © 2008-2017 German Aerospace Center (DLR). All rights reserved.