Monte Carlo Averaging for Uncertainty Estimation in Neural Networks

Njieutcheu Tassi, Cedrique Rovile (2024) Monte Carlo Averaging for Uncertainty Estimation in Neural Networks. Dissertation, Technische Universität München.

PDF - Nur DLR-intern zugänglich
5MB

Offizielle URL: https://mediatum.ub.tum.de/1694757

Kurzfassung

Although neural networks have been used for pattern classification for decades, convolutional neural networks (CNNs) have become increasingly important over the past several years. In particular, CNNs are utilized in automated scenarios for traffic sign recognition and disease classification. However, they still suffer from overfitting and lack of robustness to undesired inputs. Hence, they can generate overconfident false predictions (FPs), which can be dangerous and costly, especially when used in safety- and/or mission-critical applications. Here, overconfident FPs can (1) cause collisions in robotic applications, (2) prompt false treatments in medical applications, or (3) increase costs in financial applications. These significant consequences limit the use of CNNs in the aforementioned fields even though their technological potential is of great interest. To overcome these limitations and encourage the widespread use of CNNs in safety-and/or mission-critical applications, we aim to prevent FPs by improving the separability between true predictions (TPs) and FPs. To achieve this, we will force the degree of confidence (measuring uncertainty) to be high for TPs and low for FPs. This is based on the hypothesis that if the confidence is high for TPs and low for FPs, both TPs and FPs will be well-separated using a threshold. Therefore, the research questions are as follows: (1) Which method forces the degree of confidence to be high for TPs and low for FPs? (2) Under what circumstances does the method work? (3) At what cost does the method help to maintain a low confidence for FPs and a high confidence for TPs? To address the first question, we develop a method called Monte Carlo averaging (MCA) and compare it to related methods, such as baseline (single CNN), Monte Carlo dropout (MCD), ensemble of CNNs, and mixture of Monte Carlo dropout (MMCD). To answer the second question, we gauge the performance of the developed and related methods on four datasets with different difficulties. In addition, we gauge the performance of the developed and related methods on different CNNs to assess their performance on different architectures. Further, we investigate the impact of applying logit instead of probability averaging on the developed and related methods, as well as the impact of reducing the strength of regularization during training. To address the third question, we evaluate the ability of the developed and related methods to separate TPs and FPs and examine the classification accuracy, calibration error, and inference time. Experimental results show improvements in the developed MCA and the state-of-the-art MMCD compared to the other related methods (baseline, MCD, and ensemble of CNNs). Specifically, similar to MMCD, the developed MCA can preserve the accuracy of the underlying ensemble, which may increase the baseline accuracy. The baseline accuracy could only be preserved by MCD. Both MMCD and MCA improve the separability of TPs and FPs at the cost of increasing the calibration error and inference time. However, applying logit instead of probability averaging in MCA and related methods or reducing the strength of regularization decreases the calibration error at the cost of negatively impacting the separability of TPs and FPs. Hence, there is a tradeoff between improving the calibration and improving the separability of TPs and FPs. Although the performance of all methods heavily relies on the dataset and/or architecture, MCD and MMCD are more sensitive to the dataset and/or architecture. Overall, we developed MCA to force the degree of confidence to be high for TPs and low for FPs in order to improve the separability of TPs and FPs. Compared to the state-of-the-art MMCD, the developed MCA is more than four times faster, has the same purpose and underlying principle, and shows similar or sometimes better performance. Therefore, we suggest utilizing MCA instead of MMCD for applications that require separability of TPs and FPs and where the computational budget is limited. MCA may also be advantageous for other fields of machine learning, such as active or reinforcement learning, where uncertainty is required. Moreover, MCA is preferable in the field of explainable artificial intelligence, which explores the role of uncertainty to explain predictions and increase the social acceptance of CNN-based decision-making systems. Finally, MCA opens new perspectives to fuse features of ensemble members.

elib-URL des Eintrags:

https://elib.dlr.de/209882/

Dokumentart:

Hochschulschrift (Dissertation)

Titel:

Monte Carlo Averaging for Uncertainty Estimation in Neural Networks

Autoren:

Autoren	Institution oder E-Mail-Adresse	Autoren-ORCID-iD	ORCID Put Code
Njieutcheu Tassi, Cedrique Rovile	Cedrique.NjieutcheuTassi (at) dlr.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT

Datum:

29 Oktober 2024

Erschienen in:

mediaTUM Universitätsbibliothek Technische Universität München

Open Access:

Nein

Seitenanzahl:

141

Status:

veröffentlicht

Stichwörter:

Machine learning; Deep learning; Classification; Convolutional neural network; Ensemble; Bayesian neural network; Monte Carlo dropout; Mixture of Monte Carlo dropout; Confidence calibration; Uncertainty quantification; Uncertainty estimation; Separating true predictions and false predictions; Regularization strength; Logit averaging

Institution:

Technische Universität München

Abteilung:

TUM School of Computation, Information and Technology

HGF - Forschungsbereich:

keine Zuordnung

HGF - Programm:

keine Zuordnung

HGF - Programmthema:

keine Zuordnung

DLR - Schwerpunkt:

Digitalisierung

DLR - Forschungsgebiet:

D IAS - Innovative autonome Systeme

DLR - Teilgebiet (Projekt, Vorhaben):

D - SKIAS

Standort:

Berlin-Adlershof

Institute & Einrichtungen:

Institut für Optische Sensorsysteme
Institut für Optische Sensorsysteme > Echtzeit-Datenprozessierung

Hinterlegt von:

Irmisch, Patrick

Hinterlegt am:

03 Dez 2024 09:17

Letzte Änderung:

19 Dez 2024 12:21

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags