Nikolaou, Nikolaos und Cea, D und Valizadeh, Mahyar und Dallavalle, Marco und Staab, Jeroen und Piraud, M und Peters, A. und Schneider, Alexandra und Taubenböck, Hannes und Wolf, Kathrin (2025) A machine learning framework for modeling the associations between environmental factors and health: an application in the German National Cohort. ISES-ISEE 2025, 2025-08-17 - 2025-08-20, Atlanta, USA.
Dieses Archiv kann nicht den Volltext zur Verfügung stellen.
Kurzfassung
Objective: Human health has been associated with individual characteristics, environmental exposures, socio-economic and neighborhood settings but their interplay is not adequately understood. We aimed to build a machine learning (ML) pipeline to identify the driving environmental, socio-economic and individual factors for health outcomes, using hypertension as a case study. Material and Methods: The ML pipeline is based on three main pillars: data extract/transform/load (feature selection, imputation of missing values), modeling (hyperparameter optimization), and explainability (permutation feature importance). For our use case, we included health data from the baseline examination of the population-based German National Cohort (NAKO), conducted between 2014-19 in 16 study regions across Germany. We assigned environmental exposures (e.g., air pollution, air temperature, noise, greenness) and neighborhood factors (e.g., urbanization, deprivation) based on the participants’ residential addresses. We compared traditional regression approaches (Logistic Regression) with multiple ML methods, such as neighbor-based methods (K-Nearest Neighbor), Statistical Learning (Support Vector Machine), Ensemble Learning (Random Forest, XGBoost) and Neural Networks, to identify the main drivers for hypertension. Results: Of 204,752 participants included in our analysis, 41.2% were classified as hypertensive. Most models performed well with comparable accuracy ranging from 0.69 (K-Nearest Neighbour) to 0.73 (XGBoost) in our test set. The different approaches identified similar factors as the main drivers for hypertension with highest feature importance attributed to individual characteristics (age, body mass index, and sex). SHapley Additive exPlanations and sub-group analyses also identified environmental and neighborhood variables (minimum air temperature, noise and deprivation index), following the primary individual factors. Conclusion: Our results indicate some variation in performance and that a guided application is needed if evidence shall be generated beyond major drivers of disease such as age and sex. The ML pipeline for binary health outcomes shall be openly accessible soon, but we also plan to expand it to continuous outcomes.
| elib-URL des Eintrags: | https://elib.dlr.de/219234/ | ||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dokumentart: | Konferenzbeitrag (Poster) | ||||||||||||||||||||||||||||||||||||||||||||
| Titel: | A machine learning framework for modeling the associations between environmental factors and health: an application in the German National Cohort | ||||||||||||||||||||||||||||||||||||||||||||
| Autoren: |
| ||||||||||||||||||||||||||||||||||||||||||||
| Datum: | 18 August 2025 | ||||||||||||||||||||||||||||||||||||||||||||
| Referierte Publikation: | Nein | ||||||||||||||||||||||||||||||||||||||||||||
| Open Access: | Nein | ||||||||||||||||||||||||||||||||||||||||||||
| Gold Open Access: | Nein | ||||||||||||||||||||||||||||||||||||||||||||
| In SCOPUS: | Nein | ||||||||||||||||||||||||||||||||||||||||||||
| In ISI Web of Science: | Nein | ||||||||||||||||||||||||||||||||||||||||||||
| Status: | veröffentlicht | ||||||||||||||||||||||||||||||||||||||||||||
| Stichwörter: | Built environment, Environmental epidemiology, External exposome, Modeling, Socio-economical factors | ||||||||||||||||||||||||||||||||||||||||||||
| Veranstaltungstitel: | ISES-ISEE 2025 | ||||||||||||||||||||||||||||||||||||||||||||
| Veranstaltungsort: | Atlanta, USA | ||||||||||||||||||||||||||||||||||||||||||||
| Veranstaltungsart: | internationale Konferenz | ||||||||||||||||||||||||||||||||||||||||||||
| Veranstaltungsbeginn: | 17 August 2025 | ||||||||||||||||||||||||||||||||||||||||||||
| Veranstaltungsende: | 20 August 2025 | ||||||||||||||||||||||||||||||||||||||||||||
| HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||||||||||||||||||||||||||||||||||
| HGF - Programm: | Raumfahrt | ||||||||||||||||||||||||||||||||||||||||||||
| HGF - Programmthema: | Erdbeobachtung | ||||||||||||||||||||||||||||||||||||||||||||
| DLR - Schwerpunkt: | Raumfahrt | ||||||||||||||||||||||||||||||||||||||||||||
| DLR - Forschungsgebiet: | R EO - Erdbeobachtung | ||||||||||||||||||||||||||||||||||||||||||||
| DLR - Teilgebiet (Projekt, Vorhaben): | R - Fernerkundung u. Geoforschung | ||||||||||||||||||||||||||||||||||||||||||||
| Standort: | Oberpfaffenhofen | ||||||||||||||||||||||||||||||||||||||||||||
| Institute & Einrichtungen: | Deutsches Fernerkundungsdatenzentrum > Georisiken und zivile Sicherheit | ||||||||||||||||||||||||||||||||||||||||||||
| Hinterlegt von: | Schöpfer, Dr. Elisabeth | ||||||||||||||||||||||||||||||||||||||||||||
| Hinterlegt am: | 20 Nov 2025 10:29 | ||||||||||||||||||||||||||||||||||||||||||||
| Letzte Änderung: | 20 Nov 2025 10:29 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags