Ehrlich, Alexander (2023) A Performance Comparison between GPU Frameworks on MultiSAR. Masterarbeit, Universität Würzburg.
PDF
- Nur DLR-intern zugänglich
9MB |
Kurzfassung
In the field of high performance computing, GPUs play an important role. However, in order to be able to use them one must choose an API that provides general purpose processing functionality. The choice itself and optimal usage of such APIs are not trivial tasks. In this thesis, the programming models CUDA, OpenCL, OpenACC and SYCL are compared to each other. CUDA and OpenCL are examples of low-level APIs, while OpenACC and SYCL are considered higher level. All of them are compared not only in terms of runtime, but also memory usage, accuracy of the results as well as portability. The comparison includes multiple memory allocation types, if supported by the respective API. These configurations are tested with microbenchmarks as well, but the main application is MultiSAR, which is a program written in C++ and used by the German Aerospace Center (DLR) for processing radar data. Since it currently runs on only a single CPU thread, additional changes to the original code are made, to enable a more efficient use of a GPU’s ressources. These changes are not limited to the runtime, but also the compilation via CMake required changes. Due to varying support of certain C++ features or libraries, the specific implementations using these APIs vary as well, making this not a fully direct and fair comparison. Evaluations show a runtime improvement of more than 40x in certain configurations over the original runtime. Compared to each other, kernel execution times between CUDA, OpenACC and SYCL scored similarly, SYCL finding the fastest configuration here by a slight margin. Measuring total execution time, OpenACC’s version is the best by a wide margin, likely because of further optimizations in functions not touched with the other programming models. That comes at the cost of accuracy, which can grow up to the second digit after the decimal point in the worst case. The implementation with OpenACC performs particularly poorly here, as it introduced additional errors. Difficulties surrounding OpenCL lead to the conclusion of this being an unsuitable API for the purposes of MultiSAR, at least with the test system used. Regarding memory allocation types, the fastest one is traditionally allocating and copying memory to the GPU. However, managed memory is the recommended startup choice, because of its better portability and only slightly worse runtime.
elib-URL des Eintrags: | https://elib.dlr.de/201198/ | ||||||||
---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Hochschulschrift (Masterarbeit) | ||||||||
Titel: | A Performance Comparison between GPU Frameworks on MultiSAR | ||||||||
Autoren: |
| ||||||||
Datum: | 10 November 2023 | ||||||||
Referierte Publikation: | Nein | ||||||||
Open Access: | Nein | ||||||||
Gold Open Access: | Nein | ||||||||
In SCOPUS: | Nein | ||||||||
In ISI Web of Science: | Nein | ||||||||
Seitenanzahl: | 110 | ||||||||
Status: | veröffentlicht | ||||||||
Stichwörter: | GPU, GPU Frameworks, MultiSAR | ||||||||
Institution: | Universität Würzburg | ||||||||
Abteilung: | Institut für Informatik | ||||||||
HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||
HGF - Programm: | Raumfahrt | ||||||||
HGF - Programmthema: | Erdbeobachtung | ||||||||
DLR - Schwerpunkt: | Raumfahrt | ||||||||
DLR - Forschungsgebiet: | R EO - Erdbeobachtung | ||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | R - Fernerkundung u. Geoforschung | ||||||||
Standort: | Oberpfaffenhofen | ||||||||
Institute & Einrichtungen: | Deutsches Fernerkundungsdatenzentrum > Dynamik der Landoberfläche | ||||||||
Hinterlegt von: | Huber, Martin | ||||||||
Hinterlegt am: | 11 Jan 2024 09:19 | ||||||||
Letzte Änderung: | 11 Jan 2024 09:19 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags