elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs

Ernst, Dominik and Hager, Georg and Thies, Jonas and Wellein, Gerhard (2020) Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. In: 13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019, 120 (43). Springer. PPAM 2019, 8.-11. Sept. 2019, Bialystok, Polen. ISBN 978-303043221-8 ISSN 0302-9743

This is the latest version of this item.

[img] PDF
380kB

Abstract

General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. Nvidia's current CUBLAS implementation delivers only a fraction of the potential performance (as given by the roofline model) in this case. We describe the challenges and key properties of an implementation that can achieve perfect performance. We further evaluate different approaches of parallelization and thread distribution, and devise a flexible, configurable mapping scheme. A code generation approach enables a simultaneously flexible and specialized implementation with autotuning. This results in perfect performance for a large range of matrix sizes in the domain of interest, and at least 2/3 of maximum performance for the rest on an Nvidia Volta GPGPU.

Item URL in elib:https://elib.dlr.de/130199/
Document Type:Conference or Workshop Item (Speech)
Title:Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iD
Ernst, DominikDominik.Ernst (at) fau.deUNSPECIFIED
Hager, GeorgGeorg.Hager (at) fau.deUNSPECIFIED
Thies, JonasJonas.Thies (at) dlr.deUNSPECIFIED
Wellein, GerhardErlangen Regional Computing CenterUNSPECIFIED
Date:2020
Journal or Publication Title:13th International Conference on Parallel Processing and Applied Mathematics, PPAM 2019
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In SCOPUS:Yes
In ISI Web of Science:No
Volume:120
Publisher:Springer
Series Name:Lecture Notes in Computer Science
ISSN:0302-9743
ISBN:978-303043221-8
Status:Published
Keywords:linear algebra, high performance computing, Graphics Processing Units, memory-bounded operations
Event Title:PPAM 2019
Event Location:Bialystok, Polen
Event Type:international Conference
Event Dates:8.-11. Sept. 2019
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Space Technology
DLR - Research area:Raumfahrt
DLR - Program:R SY - Technik für Raumfahrtsysteme
DLR - Research theme (Project):R - Vorhaben SISTEC
Location: Köln-Porz
Institutes and Institutions:Institut of Simulation and Software Technology
Deposited By: Thies, Jonas
Deposited On:21 Nov 2019 09:24
Last Modified:26 Nov 2020 09:30

Available Versions of this Item

  • Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUs. (deposited 21 Nov 2019 09:24) [Currently Displayed]

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Copyright © 2008-2017 German Aerospace Center (DLR). All rights reserved.