d'Angelo, Pablo und Karlshöfer, Paul und Heiden, Uta (2025) Scalable and Energy Efficient Compositing of Sentinel-2 Time Series. ESA Living Planet Symposium, 2025-06-23 - 2025-06-27, Vienna, Austria.
|
PDF
1MB |
Kurzfassung
As Earth observation data archives continue to grow thanks to long-term missions such as the Sentinels, scalable data processing is a key requirement for increasingly complex analysis workflows. At the same time, the increase in data and computing resources results in an increase in energy consumption and thus in the carbon footprint of data analysis. By extracting the visible bare surface of agricultural fields after harvesting and ploughing, multispectral observations of the soil surface can be obtained from Sentinel-2 time series data at 20 m resolution. A complete bare surface reflectance composite can only be obtained from a multi-year time series, which is acceptable due to the low dynamics of soil properties. In the CUP4SOIL project, several soil parameters such as soil organic carbon, pH and bulk density are estimated using digital soil modelling, and the bare surface reflectance composites provide additional information to the traditionally used DSM covariates. The SCMAP compositing process detects pixels with bare surfaces based on a spectral index and regionally varying thresholds. During compositing, robust statistically based outlier detection is used to remove cloud, snow and haze pixels, and reflectance and statistical data are calculated for both bare and non-bare surfaces. Each pixel stack in the time series is processed independently, resulting in a massively parallel reduction operation with no spatial dependencies. This setting is typical of temporal compositing algorithms, which usually reduce along time and spectral dimensions with little or no spatial influence. Many existing products depend on time series analysis of Sentinel data [1, 2]. Efficient computation both decreases the environmental footprint and the costs of processing, and is thus of prime interest. This requires both efficient an implementation of algorithms, as well a compute platform that offers the required compute and data resources. While the embarrassingly parallel nature of this task provides a high scalability potential, high efficiency can only be archived when tailoring the algorithms to the performance characteristics to the employed hardware platform. Method The core SCMAP algorithm is implemented in a C++ application called from Python code responsible for product discovery and data format processing. The use of containers and the modular input interfaces allow the process to be easily adapted to different data archives and to run in cloud or HPC environments. The experiments are performed on the Terrabyte HPC platform of LRZ and DLR[3], which provides ~50 PB of GPFS storage and 271 CPU compute nodes with 40 cores and 1 TB of RAM each. These nodes are completely fanless machines, cooled with a highly efficient hot water cooling system. Using the SCMAP application, we explore several implementations and optimisations on the Terrabyte compute platform. The algorithm allows for multiple levels of parallelization as data dependencies are limited to the temporal and spectral axis. Spatially, neighbouring pixels are independent. Thus, at the SLURM task level, tiles of the Sentinel-2 tiling grid are computed using OpenMP, allowing parallel pixel computations within each task. We are investigating reordering the input data axes to improve cache coherence and align with data access patterns. Concurrent task execution on compute nodes is analysed to assess how memory allocation, task density and data request rates affect I/O complexity and file system load. The Sentinel-2 tiling grid results in spatial tiles of 100x100 km for a given date, and a standard Level 2A Sentinel-2 product stores each of the used 10 bands in separate image files. As each SLURM task processes on Sentinel-2 tile, and thus reads from 1000 to 10000 input files, parallel IO and increasing the IO chunk size were essential for high scalability of the process. In addition, we compare the performance and decompression overheads of several common file formats (cloud-optimised GeoTIFF, JPG2000). We further investigate the energy consumption of the compositing tasks and compare the energy efficiency of different processing and data storage setups. Conclusions With the current optimisations, a state-of-the-art bare surface reflectance composite for the whole of Europe can be computed from ~500 TB of Sentinel-2 L2A input data in 4:08 hours using 40 CPU nodes on the Terrabyte HPC platform. The complete process, including scheduling, input data reading, compositing and output product formatting operates with an sustained input data rate of ~200 GBit/s, and used 26 kWH of electric energy. Re-processing EU wide 5 yearly Sentinel-2 bare surface composites in case of algorithmic updates thus reduces to an overnight batch job.
| elib-URL des Eintrags: | https://elib.dlr.de/218324/ | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dokumentart: | Konferenzbeitrag (Vortrag) | ||||||||||||||||
| Titel: | Scalable and Energy Efficient Compositing of Sentinel-2 Time Series | ||||||||||||||||
| Autoren: |
| ||||||||||||||||
| Datum: | 26 Juni 2025 | ||||||||||||||||
| Referierte Publikation: | Nein | ||||||||||||||||
| Open Access: | Ja | ||||||||||||||||
| Gold Open Access: | Nein | ||||||||||||||||
| In SCOPUS: | Nein | ||||||||||||||||
| In ISI Web of Science: | Nein | ||||||||||||||||
| Status: | veröffentlicht | ||||||||||||||||
| Stichwörter: | HPC, energy efficiency, Sentinel-2, compositing, energy usage, SCMaP | ||||||||||||||||
| Veranstaltungstitel: | ESA Living Planet Symposium | ||||||||||||||||
| Veranstaltungsort: | Vienna, Austria | ||||||||||||||||
| Veranstaltungsart: | internationale Konferenz | ||||||||||||||||
| Veranstaltungsbeginn: | 23 Juni 2025 | ||||||||||||||||
| Veranstaltungsende: | 27 Juni 2025 | ||||||||||||||||
| Veranstalter : | ESA | ||||||||||||||||
| HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||||||
| HGF - Programm: | Raumfahrt | ||||||||||||||||
| HGF - Programmthema: | Erdbeobachtung | ||||||||||||||||
| DLR - Schwerpunkt: | Raumfahrt | ||||||||||||||||
| DLR - Forschungsgebiet: | R EO - Erdbeobachtung | ||||||||||||||||
| DLR - Teilgebiet (Projekt, Vorhaben): | R - Optische Fernerkundung | ||||||||||||||||
| Standort: | Oberpfaffenhofen | ||||||||||||||||
| Institute & Einrichtungen: | Institut für Methodik der Fernerkundung > Photogrammetrie und Bildanalyse Institut für Methodik der Fernerkundung > Abbildende Spektroskopie | ||||||||||||||||
| Hinterlegt von: | d'Angelo, Dr. Pablo | ||||||||||||||||
| Hinterlegt am: | 06 Nov 2025 13:15 | ||||||||||||||||
| Letzte Änderung: | 06 Nov 2025 13:15 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags