The New Volcanic Ash Satellite Retrieval VACOS Using MSG/SEVIRI and Artificial Neural Networks: 1. Development

Volcanic ash clouds are a threat to air traffic security and, thus, can have significant societal and financial impact. Therefore, the detection and monitoring of volcanic ash clouds to enhance the safety of air traffic is of central importance. This work presents the development of the new retrieval algorithm VACOS (Volcanic Ash Cloud properties Obtained from SEVIRI) which is based on artificial neural networks, the thermal channels of the geostationary sensor MSG/SEVIRI and auxiliary data from a numerical weather prediction model. It derives a pixel classification as well as cloud top height, effective particle radius and, indirectly, the mass column concentration of volcanic ash clouds during day and night. A large set of realistic one-dimensional radiative transfer calculations for typical atmospheric conditions with and without generic volcanic ash clouds is performed to create the training dataset. The atmospheric states are derived from ECMWF data to cover the typical diurnal, annual and interannual variability. The dependence of the surface emissivity on surface type and viewing zenith angle is considered. An extensive dataset of volcanic ash optical properties is used, derived for a wide range of microphysical properties and refractive indices of various petrological compositions, including different silica contents and glass-to-crystal ratios; this constitutes a major innovation of this retrieval. The resulting ash-free radiative transfer calculations at a specific time compare well with corresponding SEVIRI measurements, considering the individual pixel deviations as well as the overall brightness temperature distributions. Atmospheric gas profiles and sea surface emissivities are reproduced with a high agreement, whereas cloudy cases can show large deviations on a single pixel basis (with 95th percentiles of the absolute deviations >30 K), mostly due to different cloud properties in model and reality. Land surfaces lead to large deviations for both the single pixel comparison (with median absolute deviations >3 K) and more importantly the brightness temperature distributions, most likely due to imprecise skin temperatures. The new method enables volcanic ash-related scientific investigations as well as aviation security-related applications.


Introduction
Large, explosive volcanic eruptions might happen relatively infrequently [1], but their emissions can have massive impacts: volcanic ash can significantly interfere with critical, ground-based infrastructure [2] and can damage aircraft or even cause engine failure [3]. Aviation incidents have been reported more than 1000 km from the volcanic ash source [4], as potentially hazardous ash concentrations might not be visually observable by flight crews [5]. In the case of the eruption of Eyjafjallajökull in 2010, major parts of the European airspace were closed for extended periods of time [6], leading to estimated economic losses of US$1.7 billion for the aviation industry [7].
To mitigate the impact of future eruptions, satellite remote-sensing methods have been developed to monitor volcanic ash clouds, using both polar orbiting and geostationary passive optical imagers (e.g., [8]). Their results can be used to directly assess whether an airspace is safe to be traversed by jet planes (according to thresholds stated by the International Civil Aviation Organization (ICAO) [9]), to investigate aerosol-cloud interactions [10] or to calibrate/validate volcanic ash transport and dispersion models as applied by the Volcanic Ash Advisory Centers [11][12][13]. The latter are the main providers of information on atmospheric contamination by volcanic ash in the case of an eruption [9].
Active remote sensing instruments such as lidars provide highly resolved vertical profiles of the aerosol load [14]. However, lidars have a limited spatial and temporal coverage, e.g., the instrument aboard the polar orbiting Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) has a small footprint of 90 m × 335 m and a 16-day repeat cycle [15]. Ground instruments are fixed and airborne measurements are performed only in exceptions (e.g., [6,[16][17][18][19][20][21]). Therefore, none of those instruments is able to provide global data as are necessary in the case of volcanic ash monitoring. In this respect, geostationary radiometers, especially of the second generation, come in handy: such instruments not only provide near-global coverage, but their infrared channels also allow operation during day and night. Examples are the Geostationary Operational Environmental Satellites (GOES) covering North and South America, the Meteosat Second Generation (MSG) satellites for Europe and Africa and Himawari for Eastern Asia and Australia.
The difference between the brightness temperature at 11 µm (BT 11 ) and 12 µm (BT 12 ), in short BTD 11−12 , can be negative for clouds consisting of small volcanic ash particles (smaller than~5 µm [22,23]) and mixtures of volcanic ash and sulfuric acid (H 2 SO 4 [24,25]), whereas for ice clouds BTD 11−12 tends to be positive [26]. Therefore, this quantity is often used for volcanic ash detection. However, volcanic ash can be hidden by water or ice either within the volcanic ash cloud or as separated clouds, as well as water vapor, especially if the ash cloud itself is at low altitude [11,27]. Furthermore, large ash particles or opaque plumes do not lead to a negative BTD 11−12 [11,22]. False alarms might be produced by mineral dust aerosol [28], which has similar spectral properties as volcanic ash [11,29,30], or by non-vegetated, quartz-rich surfaces due to their emissivity [11,28].
As a consequence, more sophisticated detection schemes have been proposed: to correct for the presence of water vapor, a BTD 11−12 threshold depending on BT 11 has been suggested where the exact function depends on the atmospheric conditions [27]; this resulted in retrievals of larger contaminated areas. Multiple threshold tests were proposed, incorporating, for instance, also BT 8.7 measurements (e.g., [31,32]) or simulated clear-sky brightness temperatures BT 11 and BT 12 from numerical weather predictions (e.g., [31]). Furthermore, it was shown that reflectances in the visible and the near-infrared as well as their ratio can further help to separate volcanic ash clouds from water and ice clouds, especially for optically thick plumes for which BTD 11−12 tends to vanish (e.g., [33][34][35]).
As radiance measurements are affected not only by the volcanic ash cloud but also by the atmospheric state, other meteorological clouds and the surface properties, it was suggested to derive quantities that are closer linked to the target cloud's properties. An example is the ratio of effective absorption optical depths at different wavelengths, called β ratio, which can be approximately expressed by single scattering properties (e.g., [8,36,37]). For the calculation of β ratios, clear sky properties have to be determined by radiative transfer calculations. The combination of multiple β ratios of different infrared channels is a good discriminator of volcanic and meteorological clouds [37]. A high spectral resolution can allow for new detection schemes, either directly based on the functional behavior of the brightness temperature spectra, thereby also enabling the separation of volcanic ash from mineral dust (e.g., [29,30]), or by performing singular vector decompositions with some vectors representing clear sky conditions, whereas others describe the volcanic ash influence. A linear decomposition of a measurement with respect to these basis vectors then reveals whether volcanic ash is present or not [38]. An alternative approach is to detect sulfur dioxide (SO 2 ) as a proxy for volcanic ash as both are often emitted simultaneously [39]. Although both detections can show reasonable spatial agreement, they might differ in some cases [40], sometimes with more than 80% of the volcanic ash remaining undetected [41]. This might be rooted in the presence of distinct volcanic ash and SO 2 layers that get separated due to vertical wind shear [42].
To retrieve microphysical (especially the effective particle radius r eff ) and macrophysical properties (optical depth τ and mass column concentration m col ), brightness temperatures (usually BT 11 and BTD 11−12 or similar) have been precalculated for generic atmospheric settings including only a volcanic ash layer and used as look-up tables (e.g., [22,33,43]). More complex atmospheres were assumed to correct for water vapor, based on either measurements of the surrounding of the volcanic ash clouds or radiative transfer calculations (e.g., [27,43]). The optimal estimation method aims to minimize a cost function (here principally an uncertainty weighted difference between an atmospheric state vector and an a priori assumption, as well as an observation vector and corresponding estimates), usually iteratively for non-linear problems, incorporating radiative transfer calculations [37,44]. For example, Francis et al. [31] applied this approach to retrieve pixelwise the ash layer pressure, mass loading m col and effective radius r eff based on observations at 10.8 µm, 12 µm and 13.4 µm, whereas Pavolonis et al. [37] used the same observations to determine different β ratios, emissivities and temperatures, later on converting these to microphysical properties. Retrievals can also be performed by making use of the surrounding ash-free area; e.g., Pugnaghi et al. [45] interpolated the radiances across a volcanic ash plume between the edges to obtain an ash-free image. Combining the radiances measured with and without ash, the transmittance of the ash plume could be calculated for different wavelengths. Finally, the effective radius r eff and the optical depth τ were determined from the transmittances using conversions from radiative transfer calculations. There are further methods to determine the volcanic ash cloud top height z top . The brightness temperature of opaque parts of a cloud can be assumed to approximately correspond to the ambient temperature. A nearby temperature profile (e.g., using a numerical weather prediction or a radiosonde measurement) can be applied to convert the brightness temperature into an altitude [33,43,46]. During daytime, the difference in the cloud position as seen by the satellite and the sun induced cloud shadow can be used to geometrically calculate z top [33,43]. Stereoscopic instruments allow inferring z top from the spatial shift between the projection of a cloud in images retrieved under different viewing angles [33,38,43,47]. The carbon dioxide (CO 2 ) slicing method compares multiple channels around the CO 2 absorption feature which have weighting functions peaking at different heights [48,49].
A different approach is the application of artificial neural networks (ANNs), which can be considered as universal approximators for unknown functions [50]. Based on initial research in the 1940s, this method has gained much attention and has significantly advanced in recent decades [51]. It has been used for prediction, functional approximation and classification tasks for numerous problems of atmospheric sciences [52]. With respect to satellite remote sensing, some examples are the retrieval of properties of water clouds [53,54], ice clouds [55,56], ozone profiles [57], volcanic SO 2 [58][59][60] and surface reflectivity [61]. Often, the utilized training datasets either consist of collocated measurements of different instruments [54][55][56] or are created using radiative transfer calculations [53,57,[59][60][61]. One of the major advantages of ANNs is that, once they are trained, they are fast in application compared to other methods using time-consuming radiative transfer calculations during the retrieval. Gray and Bennartz [62] trained two ANNs for the detection of volcanic ash and SO 2 -rich ash, respectively, using Moderate Resolution Imaging Spectroradiometer (MODIS, e.g., [63]) measurements. The training data were composed of MODIS images of different volcanic eruptions with the target classification performed based on Hybrid Single Particle Lagrangian Integrated Trajectory (HYSPLIT) simulations of the volcanic emissions. The input data consisted of brightness temperature (differences) of channels between 7.3 µm and 12 µm. Picchiani et al. [64] trained separated ANNs for volcanic ash detection and ash mass loading m col retrieval from MODIS measurements. The training data consisted of MODIS images from Etna eruptions with the target classification performed by applying the BTD 11−12 < 0 criterion, whereas the target mass loading was determined by a look-up table approach. The channels centered at 7.3 µm, 11 µm and 12 µm were used as input. A more detailed classification ANN was trained by Picchiani et al. [65], labeling ash above sea and above meteorological clouds, meteorological clouds themselves and sea, ice and land surfaces using MODIS data. The training data consisted of MODIS images from the Eyjafjallajökull 2010 and Grimsvötn 2011 eruptions with the target classes derived from the BTD 11−12 < 0 criterion, the MODIS land/sea mask, cloud products and BT 2.13 (band 7) for detection of ice surfaces. The input features include 14 MODIS channels in the visible and infrared and a land/sea mask. The method was further developed by Piscini et al. [66], training individual ANNs for the retrieval of the mass load m col , the effective radius r eff , the optical depth at 11 µm (τ 11 ) and the SO 2 column concentration using MODIS observations. Training data were MODIS images from the Eyjafjallajökull 2010 eruption with the target values determined by other retrievals based on radiative transfer calculations, similar to [64]. Initially, all MODIS channels were used as input features, with a pruning procedure performed after the training to find the most important inputs. The ANN ansatz by Picchiani et al. [64] and Piscini et al. [66] was compared to the look-up table and the volcanic plume removal procedure, finding that the look-up table method can be more accurate, but the ANN approach can be less sensitive to perturbations in the satellite measurements [67]. Zhu et al. [68] developed a method to retrieve volcanic cloud top heights z top combining a stacked denoising autoencoder for feature extraction followed by a least squares support vector regression to derive z top . The training data consisted of collocated Spinning Enhanced Visible and Infrared Imager (SEVIRI, aboard MSG [69]) brightness temperatures and Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP, aboard CALIPSO [15]) derived z top for volcanic ash clouds from the Eyjafjallajökull 2010 and Puyehue-Cordón Caulle 2011 eruptions. Adding vertical temperature profiles from European Centre for Medium-Range Weather Forecasts (ECMWF) simulations further improved the retrieval performance.
The aforementioned volcanic ash algorithms used real satellite images as training data, with target values coming either from other retrievals, other sensors or trajectory models. In contrast, the algorithm Volcanic Ash Detection Utilizing Geostationary Satellites (VADUGS), applying a single ANN for the retrieval of m col and z top , was based on a fully simulated training dataset. The input data consisted of the thermal infrared channels of MSG/SEVIRI and auxiliary data such as a land/sea mask, the skin temperature and the viewing zenith angle [70,71]. The ANN architecture followed the development of the cirrus cloud retrieval Cirrus Optical Properties derived from CALIOP and SEVIRI Algorithm during Day and Night (COCS, [55]), which was trained with collocated CALIOP and SEVIRI measurements to retrieve cirrus optical depths and cloud top heights. Since 2015, VADUGS runs operationally at the German weather service [72]. Although it produces reasonable results upon visual inspection, a validation against simulated samples has shown that the retrievals are reliable in certain subsets of the test dataset, but not in general [71]. An intercomparison of satellite products exhibited overall low correlations between the retrieval of m col by VADUGS and other algorithms and found a strong underestimation of z top when compared with CALIOP results [73]. In addition, note that VADUGS was developed focusing on the Eyjafjallajökull 2010 eruption as only the refractive index of the corresponding volcanic ash was used for the training data [71]. However, the refractive index can vary significantly for different volcanic ashes [74,75] and retrievals are sensitive to it [22,76,77]. Technically, potential improvements can be derived from the development of Cirrus Properties from SEVIRI (CiPS [78,79]), which is the successor of COCS. It is based on a similar training dataset but uses a new ANN architecture and training procedure, additional input features and updated CALIOP data. CiPS exhibited a better performance compared to COCS and retrieved additional quantities, e.g., the ice water path.
Building upon VADUGS, a new algorithm called Volcanic Ash Cloud properties Obtained from SEVIRI (VACOS) is developed and described in two papers ( Figure 1). In Part 1 (this paper), a training dataset consisting of simulated MSG/SEVIRI measurements is created using modeled atmospheric profiles and a climatology of the surface emissivity.
A parameterization of the sea surface emissivity is applied that depends on the viewing zenith angle and wind speed. To cover the variability of volcanic ash clouds, the extensive set of volcanic ash refractive indices and optical properties from Piontek et al. [23] is used together with a wide range of possible ash cloud top heights, geometrical thicknesses and mass concentrations. This constitutes a major advantage compared to the previously mentioned ANN-based volcanic ash retrievals, which were trained on satellite images of only one [64,66], two [65,68] or seven [62] volcanic eruptions or used only a single volcanic ash type [71]. Methodologically, we train four separated ANNs for the classification and the retrieval of the optical depth at 10.8 µm (τ 10.8 ), ash cloud top height (z top ) and effective particle radius (r eff ) of the volcanic ash clouds. The ANNs have individual input features and training datasets. Part 2 [80] contains a validation of the retrievals with respect to simulated test datasets, a sensitivity study of the algorithms with respect to the vertical mass profile of volcanic ash layers, case studies comparing the results of the new retrievals with independent lidar and in situ measurements as well as model results and an analysis of the working principles of the ANNs.  [23], (yellow) radiative transfer calculations to compose a training dataset and (blue) training of different ANNs (both in this paper) and (green) validation against simulated test data and other independent measurements and model results [80].
The rest of this paper is organized as follows. We introduce the observation instrument MSG/SEVIRI (Section 2) and the predecessor retrieval VADUGS, including a short, general description of ANNs (Section 3). Next, the training dataset is sketched including its analysis (Section 4), followed by the description of the new ANNs, their input features and their training (Section 5) as well as their application (Section 6). Finally, we give a conclusion and an outlook (Section 7).

MSG/SEVIRI
VACOS is tailored for the Spinning Enhanced Visible and Infrared Imager (SEVIRI) carried by the geostationary Meteosat Second Generation (MSG) satellites. SEVIRI is a passive 12-channel imager, measuring radiation in the visible and infrared part of the spectrum. The radiances in the different thermal channels are converted to brightness temperatures (BT). For the retrieval, we consider only the seven infrared channels such that it can be applied during day and night. Three of them are window channels (centered at 8.7 µm, 10.8 µm and 12 µm), two are strongly sensitive to water vapor (H 2 O, 6.2 µm and 7.3 µm) and another two (9.7 µm and 13.4 µm) to ozone (O 3 ) and carbon dioxide (CO 2 ), respectively. The temporal resolution of SEVIRI is 15 min for the full disc and 5 min in rapid scan mode, which covers mainly Europe. The spatial resolution is 3 km at nadir. The channels are rather broad (spectral bands up to 2 µm) and instrument specific [69]. Figure 2 shows a red-green-blue composite of the SEVIRI disc. There are multiple MSG satellites deployed at different coordinates. In the following, we focus on Meteosat-9/MSG2. From 11 April 2007 to 21 January 2013, it was located at 0°E as the primary operational satellite and covered the prominent eruptions of Eyjafjallajökull 2010, Grimsvötn 2011 and the volcanic ash clouds of Puyehue-Cordón Caulle 2011. From 9 April 2013 to 20 March 2018, it provided the rapid scanning service at 9.5°E. As of 29 June 2020, it is located at 3.5°E as a back-up spacecraft [81,82]. Note that other current or future imagers aboard geostationary satellites have similar spectral channels, e.g., the Advanced Baseline Imager on GOES-R, the Advanced Himawari Imager on Himawari-8/9, the Advanced Meteorological Imager on GEO-KOMPSAT-2A, the Advanced Geosynchronous Radiation Imager on Fengyun-4A or the Flexible Combined Imager on the Meteosat Third Generation satellites [83][84][85][86]. Thus, the method described here can in principle be extended to those as well.

VADUGS
The algorithm VADUGS (Volcanic Ash Detection Utilizing Geostationary Satellites) allows pixelwise retrieval of volcanic ash cloud properties using SEVIRI measurements and ANNs [71]. ANNs have been developed based on biological insights on the behavior of the human brain. The feed-forward configuration is made up of multiple layers, with the first one (called input layer) consisting of the input features, and the last one the output layer. In between is an arbitrary number of so-called hidden layers. Hidden and output layers consist of so-called neurons. Those are (usually non-linear) functions receiving the weighted sum of the results of the previous layer's neurons (or input features in the case of the first hidden layer) as an argument. The weights between all pairs of neurons of successive layers are different. They are chosen such that the n input features are (approximately) mapped to the corresponding m target values; thus, an ANN is a function mapping R n → R m . Using the backpropagation algorithm, the weights are determined in an iterative procedure (called training) by changing their values such that the loss function (a metric quantifying the difference between the output of the ANN for a set of input data samples and the associated target outputs) is minimized. Thus, a training dataset is needed for which the target values are known for all samples. The loss function evaluated on a separate validation dataset is monitored during training to prevent overfitting, i.e., to avoid learning the noise of the training dataset [51,52,87].
VADUGS is a single ANN with one hidden layer with 600 neurons. The input data consist of the infrared brightness temperatures measured by SEVIRI, the skin temperature from ECMWF and a land/sea mask and the viewing zenith angle. The output layer gives m col and z top . Radiative transfer simulations were performed to calculate the brightness temperatures for generic atmospheric settings, leading to a dataset of properly tagged samples used for the training. For the simulations, realistic atmospheric conditions were chosen based on ECMWF reanalysis data; to cover seasonal variations, 12 UTC of the 15th day for the months February 2010 to January 2011 was considered. Meteorological cloud layers were incorporated based on the layer-resolved cloud fractions given by ECMWF and parameterized as either liquid or ice water cloud (see also Section 4.1.2). Single homogeneous volcanic ash layers were simulated using the complex refractive index of ash from the Eyjafjallajökull eruption 2010, spherical and spheroidal particle shapes and two different lognormal particle size distributions [71].

Training Dataset
This section covers the creation of the new training dataset, including a description of the input data of the radiative transfer calculations (Section 4.1) and the calculations themselves (Section 4.2), a validation of the ash-free case (Section 4.3) and the selection of training, validation and test subsets (Section 4.4).

Input Data
In the following, we describe different input data for the radiative transfer calculations, their variability and the settings. More specifically, we discuss the surface emissivity (Section 4.1.1), the vertical profiles of atmospheric clouds and gases (Section 4.1.2) and the volcanic ash clouds (Section 4.1.3).

Surface Emissivity
For the surface emissivity, we use data from Zhou et al. [88][89][90]. Those were calculated using measurements of the polar-orbiting IASI instrument over ten years (2007-06 to 2017-05), covering the full globe. The emissivities were averaged over the ten years and for each month. The final spatial resolution is 0.25°and the spectral resolution is 0.25 cm −1 for 645 to 2760 cm −1 (roughly the wavelength range 3.6 to 15.5 µm). For sea surfaces, the emissivity exhibits also a strong dependence on the viewing zenith angle θ vza and the wind speed w ws : an increase of θ vza reduces the emissivity, whereas an increase of w ws reduces the emissivity at small θ vza but increases it at large θ vza [91][92][93]. The impact can be on the order of 10%. Here, θ vza is determined from the geographic coordinates for MSG2, whereas the wind speed w ws = √ U 2 + V 2 is based on the horizontal wind speeds at 10 m above the surface, U and V, as given by ECMWF (Section 4.1.2). We use the calculations by Masuda [94] which incorporate the surface-emitted surface-reflected radiation into the sea surface emissivity for different wavelengths λ (3.7 µm, 11 µm and 12 µm), θ vza (0 to 85°) and w ws (0 to 15 m s −1 ). We divide the calculated emissivities by the value for θ vza = w ws = 0. Then, for each wavelength, a function of the form is fitted, with f describing the reduction of the emissivity relative to the case θ vza = w ws = 0 at λ and g and h being polynomials of sixth degree. g describes mainly the dependence on θ vza , h is the correction due to w ws and f is interpolated among the three wavelengths and constantly extrapolated beyond. For sea surfaces, the IASI-measured emissivities are multiplied by f as the data of Zhou et al. [88][89][90] do not include the dependence on θ vza and w ws explicitly. Note that Masuda [94] considered w ws at a height of 12.5 m. However, the difference to w ws derived from ECMWF ERA5 data is assumed to be negligible. During application, θ vza > 85°and w ws > 15 m s −1 are set to these limiting values. Similar to water, the emissivity of land surfaces decreases with increasing θ vza . However, the relations depend strongly on the soil type and the wavelength, and the results vary between different experiments. For instance, significant decreases of the emissivity have been observed for sand: Labed and Stoll [95] found changes of~6% for wavelengths of 10.6 µm and 12 µm between θ vza = 0°and 80°; Snyder et al. [96] reported differences up to~4% for 8 to 10 µm and~2% for 10 to 14 µm between θ vza = 10°and 53°; Sobrino and Cuenca [97] measured decreases of~3% between θ vza = 0°and 65°for a spectral band at 8 to 14 µm; Cuenca and Sobrino [98] found a reduction of~5.8% between θ vza = 0°and 60°f or a channel covering 8.2 to 9.2 µm; McAtee et al. [99] indicated a decrease by up to~8% comparing θ vza = 0°to 70°for the spectral range of 8 to 12 µm; García-Santos et al. [100] got a difference larger than 10% between θ vza = 0°and 70°for a spectral band at 8.2 to 8.7 µm. On the other hand, Sobrino and Cuenca [97] did not find a dependency of emissivity on θ vza for grass and Snyder et al. [96] mostly less than 1% for a sample of compost with grass and leaves. In addition, for θ vza up to 30°, the changes in the emissivity are mostly negligible [97,98,100]. Therefore, we neglect the dependence of the emissivity on θ vza for land surface in the following.

Atmospheric Data
Every radiative transfer calculation needs an atmospheric state as input. We use ECMWF ERA5 reanalysis data [101], in particular the skin temperature, temperature profile, logarithm of surface pressure, 10 m U and V wind components, specific humidity, ozone mass mixing ratio, fraction of cloud cover, specific cloud liquid and ice water contents and land/sea mask. Additionally, the total column water, water vapor and ozone are included in the training dataset; those quantities are not needed for the radiative transfer calculations, but they are used as input features for the ANNs.
Data of three arbitrary, recent years are collected: 2010, 2013 and 2015. For each year, the 15th day of each month is considered. Compared to the data used for VADUGS, we have an increased vertical resolution (137 instead of 91 model levels) and temporal resolution (1 h instead of only 12 UTC [102]). Figure 3 shows exemplarily the variability of the skin temperature. The daily mean skin temperature exhibits an annual variability of~20 K in central Europe, whereas skin temperature within a single day might vary about 10 K in central Europe but~40 K in Northern Africa. This stresses the necessity to cover the full yearly as well as daily variability of the atmospheric state in a sufficient temporal resolution. Figure 3c shows the differences in daily mean skin temperature between 2010 and 2015 for a single day. These can lead to temperature differences of roughly −10 to 10 K, which is why we base our calculations on data of three different years. Furthermore, the high temporal resolution allows to capture the full daily cycle of the atmospheric properties. Figure 4 shows that a coarser resolution of, e.g., 6 h might miss a part of the skin temperature variability. When considering a location close to a longitude of 0°E, the 6 h resolution can reproduce the minima and maxima in the daily course of the skin temperature. However, when considering a larger longitude, this might change as the sun is in zenith at a different time with respect to UTC. For instance, at 20°N, 30°E, the local minimum is~3 K lower than the temperature at 0 UTC, while, at 20°N, 55°E, the local maximum is~3 K higher than the temperature at 12 UTC. Thus, the hourly resolution helps to create a training dataset that enables the resulting retrieval to work at all longitudes at all times of day, as required by a geostationary sensor.
Based on the ECMWF data, the atmospheric state is composed similarly to the method of Bugliaro et al. [71], i.e., vertical temperature profile, skin temperature, wind speed at 10 m altitude and densities of gaseous water (H 2 O) and ozone (O 3 ). Oxygen (O 2 ) and carbon dioxide (CO 2 ) are derived from the air density using constant mixing ratios of 0.20948 [103] and 0.0004 [104], respectively, whereas, for nitrogen dioxide (NO 2 ), the mixing ratios stem from a chemical transport model with 72 model levels and a latitudinal and longitudinal resolution of 2°and 2.5°, respectively, simulating November 2012 to October 2013; the daily average of the 15th of each month was used [105,106]. Those five gases are required to perform corresponding radiative transfer calculations [107]; especially H 2 O, CO 2 and O 3 have strong absorption features in the thermal infrared MSG/SEVIRI channels [108].  Meteorological clouds are extracted from ECMWF data as well; however, the maximum random overlap rule of ECMWF cannot be implemented in a one-dimensional (1D) radiative transfer model. Thus, using the cloud fractions for each layer, a random set of 1D clouds is created. No partial cloudiness is considered and vertically adjacent cloud layers are assumed to overlap as much as possible (for details, see [71]). For liquid water clouds, the parameterizations by Bugliaro et al. [109] and Hu and Stamnes [110] are used to create the r eff profiles and the optical properties, respectively. For ice water clouds, the parameterization by Wyser [111] and the rough-aggregate habit [107] with the parameterization by Heymsfield et al. [112], Yang et al. [113], Baum et al. [114] are applied for r eff and the optical properties, respectively. Note that the composed atmospheres are fully consistent, i.e., the vertical temperature and gas profiles match the cloud profiles (i.e., humidity saturation at the correct altitudes). The atmospheres, in turn, match the surface emissivities and the viewing zenith angles. This distinguishes our approach from radiative transfer calculations by Krebs et al. [115] or Vázquez-Navarro et al. [116], who also created comprehensive simulated datasets of MSG/SEVIRI observations, but combined atmospheric profiles with random cloud layers, constant surface emissivities and arbitrary viewing zenith angles. In the case of the VADUGS training data, atmospheric profiles and clouds were consistent, but not the viewing zenith angles [71].

Volcanic Ash Clouds
Volcanic ash clouds exhibit a significant amount of variability. The volcanic ash cloud top height z top depends on the intensity of the eruption: whereas weak eruptions emit ash only up to a few hundred meters [117], affecting mainly the direct surrounding of the vent, Eyjafjallajökull 2010 injected volcanic ash at heights of 3 to 10 km above sea level [118], Puyehue-Cordón Caulle 2011 injected ash up to 15 km high [38,119]. Even heights > 25 km are possible [120,121], although much rarer [1]. The cloud height also depends on the atmospheric conditions [122] and changes during the ash cloud's lifecycle, e.g., the height decreases due to gravitational settling [10]. Thus, in the following, z top ∈ [0.3, 18] km is considered.
The vertical mass profile can be quite complicated, especially for aged clouds, e.g., with a non-uniform distribution and multiple layers [6,18,123]. As SEVIRI offers only limited possibilities for sounding, we consider only the simplest profile of a single, homogeneous layer (as in [71,124]). In addition, the vertical extent z ext shows a large variability from some hundred meters up to some kilometers [6,18]. Marenco et al. [18] proposed z ext = √ 2 m col, meas /c max, meas with m col, meas being the measured mass load and c max, meas the measured peak mass concentration, arguing that this leads (together with the assumption of a homogeneous layer with the concentration c being c max, meas / √ 2) to a good representation of real ash clouds for radiative transfer. Another approach based on plume rise calculations for stable stratified atmospheres suggests z ext = 0.4z top for the depth [125,126] and has been applied to volcanic ash cloud retrievals [33,43,127]. The latter relation is also assumed for VACOS. Thus, after choosing z top , the vertical extent is z ext ∈ [100 m, 0.4z top ].
For the mass volume concentration c, typical values depend again on the eruption strength and the ash cloud's lifecycle, as sedimentation and dispersion may lead to a thinning of the cloud. Przedpelski and Casadevall [128]  z ext ]; for a typical cloud thickness of z ext = 1 km, this would cover mass volume concentrations up to 30 mg m −3 , covering all three contamination regimes according to ICAO.
Volcanic ashes themselves can also differ significantly with respect to chemical composition, particle size and shape. Here, we consider the comprehensive set of optical properties covering the variability of all three properties described by Piontek et al. [23]. The refractive indices of volcanic ashes, as shown in Figure 5, were calculated by averaging the refractive indices of different components of volcanic ash (i.e., minerals and glasses) according to typical petrological compositions; the last depends on the silica content x s , which was varied from 45 to 75 wt.%, and the ratio between volcanic glass and minerals, f glass , which varied between x s /100 wt.% and 1. Focusing on distal ash, the porosity of volcanic ash [21] is neglected.
The microphysical properties were chosen based on a literature review: a log-normal particle size distribution (Equation (7) in [23]) was assumed with r eff ∈ {0.6, 1.8, 3, 4.5, 6} and geometric standard deviations s ∈ {1.5, 2.0}. Pro-and oblate spheroids were assumed in equal parts, with the aspect ratio following a modified log-normal distribution (Equation (9) in [23]) with median aspect ratio 0 = 1.5 and a spread σ ar = 0.45. Using Mie theory and the T-matrix method [129], the optical properties were derived for wavelengths of 5 to 15 µm, as shown in Figure 6.  To exclude outliers as potentially unphysical volcanic ashes, only a selection of ash types is used for the training below (see Figure 6). Therefore, the standard deviation σ(r eff , s) of the mass extinction coefficient at 10.8 µm, k 10.8 , of all volcanic ashes is calculated. Next, the mass extinction coefficient k Eyja (r eff , s) at 10.8 µm for Eyjafjallajökull ash from Deguine et al. [75] is determined for the same microphysical properties r eff , s and shape [23]. Finally, we keep only those volcanic ashes that are 1σ close to the Eyjafjallakökull ash, i.e., that fulfill k 10.8 − k Eyja (r eff , s) ≤ σ(r eff , s). Overall, 57% of the ash types pass this test. Figure 6 shows that for r eff = 0.6 µm mostly ashes of low x s are excluded, whereas for r eff = 1.8 µm mainly high x s are dismissed. For the other r eff , the selection is relatively balanced with respect to x s .

Radiative Transfer Calculations
To create the input files of the radiative transfer calculation, the algorithm RTSIM [71] randomly picks uniformly distributed times among those covered by the ECMWF data (Section 4.1.2), coordinates with respect to the SEVIRI disc and compiles the corresponding surface properties and atmospheric, cloud and ash profiles; meteorological clouds can be created in~51% of the cases. For each set of input parameters, four calculations are performed if possible: clear-sky conditions; only meteorological clouds; only volcanic ash clouds; meteorological and volcanic ash clouds. The ash cloud parameters (i.e., z top , c, τ 10.8 and r eff ) and the simulated brightness temperatures together enter the training data.
1D radiative transfer calculations of the thermal infrared brightness temperatures as measured by SEVIRI are performed using libRadtran version 2.0.3 [130,131] and the Cversion of the Discrete Ordinate Radiative Transfer Solver (DISORT [132,133]) with 16 streams. The Cluster for Advanced Research in Aerospace (CARA) of the Deutsches Zentrum für Luftund Raumfahrt (DLR) is used, allowing the calculation of 1000 samples with 7 simulated brightness temperatures each on a single node within approximately 100 s.
To account for gas absorption, a method by Buehler et al. [134] is used in the implementation by Gasteiger et al. [135], called REPTRAN. It performs radiative transfer simulations at representative wavelengths within a given spectral interval (on average 3 and typically <10) and calculates a weighted sum of them as an approximation of the integral of the top of atmosphere radiance over a satellite channel's spectral response function/a narrow spectral band. The representative wavelengths and the weights were determined such that the approximation for the integrated top of atmosphere radiance has an error < 1%, using a training dataset of simulated top of atmosphere radiances with a high spectral resolution covering a large variety of atmospheric states. Four different parameterizations are available: channel (optimized for SEVIRI's spectral channels), coarse (band width of 15 cm −1 ), medium (5 cm −1 ) and fine (1 cm −1 ). channel uses the least number of spectral sampling points and is fastest. However, Gasteiger et al. [135] pointed out that the applicability of the parameterization might cease if a significant spectral variability is introduced which has not been considered in their training dataset. For instance, surface emissivity was assumed wavelength-independent by Gasteiger et al. [135]. In our simulations, surface emissivity shows a strong spectral variability, especially for sand [88]. Furthermore, the refractive index of the volcanic ash, which has imaginary values between 0 and about 1.4 ( Figure 5), was assumed wavelength-independent and between 0.001 and 0.1 by Gasteiger et al. [135].
To select accurate parameterizations, we performed test calculations of the brightness temperatures for the REPTRAN modes channel, coarse, medium and fine. Cases with and without meteorological clouds/volcanic ash were considered with the ash cloud parameters as described in Section 4.1.3 and an example ash with the refractive index of Eyjafjallajökull ash from Deguine et al. [75], a log-normal size distribution with r eff = 0.6 µm, s = 1.5 and the previously described shape distribution. In total, for each parameterization, 500 atmospheric states were simulated with each up to four cloud states as described above. The differences to the fine calculations are shown in Figure 7, assuming that those represent the most accurate approximation to a line-by-line calculation. This is supported by the fact that the spread decreases when considering a higher-resolution approximation (except for BT 9.7 ). Median differences for channel are the largest for all channels but BT 9.7 . This might reflect the fact that BT 9.7 is sensitive to ozone, which is mainly present in the stratosphere and, thus, might hide the impact of the additional spectral variability due to the surface emissivity and the volcanic ash refractive index. Outliers reach absolute differences >2 K (not shown). In some cases (e.g., BT 6.2 and BT 10.8 ), the differences between coarse and medium are rather small, such that the lower resolution parameterization is applied. Overall, we conclude to use the channel parameterization for BT 9.7 ; coarse for BT 6.2 , BT 10.8 and BT 12.0 ; medium for BT 7.3 , BT 8.7 and BT 13.4 . The considered test dataset is described in the text. The boxplot shows the median, first and third quartile (box) and the 5th and 95th percentile (whiskers).

Test of the Ash-Free Training Data
The presented method is not expected to reproduce observations on a single pixel basis as, for example, spatial resolution is too coarse, averaged surface emissivities are used and the ECMWF model might not represent reality, especially clouds, accurately enough. However, the aim of the setup is to create a dataset that statistically approximates the reality. To validate this, 49,701 simulations without ash for 15 July 2015 at 12:00 UTC were performed, randomly scattered over the SEVIRI disc and compared with the corresponding SEVIRI measurements (see Figure 2). If RTSIM created no clouds in the atmosphere, the cloud-free simulation was used, otherwise the simulation containing clouds. The distributions of the simulated and the corresponding measured brightness temperatures should be similar, and thereby would indicate that RTSIM creates atmospheric profiles and libRadtran derives brightness temperatures that generally approximate reality, although individual samples might deviate from the measurements. Thus, simulations can then be viewed as a strong training dataset. Figure 8 shows a two-dimensional histogram for the full dataset of measured against simulated BT 10.8 . Most samples are located close to the identity, with slightly more points above the identity than below, i.e., the simulation tends to overestimate the brightness temperature. Single points show large differences up to about 80 K between simulation and measurements, probably when the simulation is cloud-free and reality shows a high cold ice cloud. Figure 9 shows the median and the 95th percentile of the absolute difference between the simulated and the measured brightness temperatures. Different subsets are considered: (a) all samples as well as land and sea samples for clear conditions; (b) clear and cloudy samples for sea surfaces; (c) viewing zenith angle θ vza separated by 40°and 55°for clear conditions over sea; (d) viewing zenith angle θ vza separated by 40°and 55°for clear conditions over land. Clear conditions are determined by the fact that no cloud layers were created by RTSIM and that the total cloud cover by ECMWF is below 25%; of course, observations might still contain meteorological clouds. The statistical distributions of the measured and simulated brightness temperatures (and brightness temperature differences BTD 8.7−10.8 and BTD 10.8−12.0 ) are shown as histograms in Figure 10. Figure 11 shows as an example the distribution of BT 10.8 for the subsets of clear sky land/sea samples and clear/cloudy samples of sea surfaces.    Atmospheric gases, meteorological clouds and the surface properties are the main aspects that determine the quality of the simulations. Water vapor is mainly visible in BT 6.2 and BT 7.3 , with the latter being sensitive at least down to the mid-troposphere [69]. Their brightness temperature distributions in Figure 10 show a good agreement, and, even on a single pixel basis, the deviations are small, with median absolute deviations mostly below 1 K. The effect of H 2 O on the atmospheric window channels BT 8.7 , BT 10.8 and BT 12 is small [108,136]. However, the single pixel deviations of the window channels are larger than those of BT 6.2 and BT 7.3 when considering only clear sky samples, indicating that other effects dominate the former. O 3 and CO 2 mainly impact BT 9.7 and BT 13.4 , respectively. Their distributions in Figure 10 generally agree well up to a small peak on the right side that is not fully reproduced. However, as this minor peak is also present in the distributions of the window channels, it can be expected to stem from the surface properties, particularly from land surfaces that produce multiple peaks. The single pixel median absolute deviation for all samples is generally around 2 K.
Meteorological clouds impact the atmospheric window channels. Individual pixel deviations are remarkable: the median absolute deviation of cloudy samples is~6 K, with the 95th percentile of the absolute deviation even beyond 30 K. This indicates that the largest deviations in Figure 8 could be caused by the occurrence of meteorological clouds, for instance, if they are present in reality but missing in the simulation, or if the there are significant differences in the cloud top heights. The size of the smaller deviations might be related to inaccuracies in the cloud properties, e.g., the cloud top height, the liquid and ice water content derived from the ECMWF model or r eff derived from parameterizations. In addition, clouds are described differently in the 1D radiative transfer calculations than in nature, which has an impact especially in the case of partial cloudiness, and a random element is applied for the creation of the cloud layers [71]. However, the resulting brightness temperature distributions in Figures 10 and 11 agree with the SEVIRI measured distributions.
The surface properties (emissivity and skin temperature) influence the atmospheric window channels as well. As pointed out above, their brightness temperature distributions show a good agreement ( Figure 10). The distributions of BT 10.8 for sea surfaces and clear sky pixels roughly agree, whereas for land surfaces two peaks of similar height are visible with the right flank of the simulated distribution shifted towards lower temperatures (Figure 11a). The single pixel comparison exhibits generally low median absolute deviations (<2 K) for clear sky sea surfaces, but the deviation is larger for land than for sea. Considering the θ vza -dependence for sea surfaces, the median absolute deviation is largest for 55°< θ vza , which might be related to the strong θ vza -dependence of the water surface emissivity for large viewing zenith angles. On the contrary, for land surfaces, the median absolute deviation is largest for θ vza < 40°. This seems reasonable as the θ vza -dependence is smaller for land surfaces than for water surfaces. Furthermore, a higher θ vza leads to a larger gas column along the optical path, thereby effectively hiding deviations due to inaccurate surface properties.
The surface emissivity is a climatology over 10 years, whereas the actual emissivity in the present scene might slightly deviate, e.g., due to wetter or dryer surfaces, more or less vegetation, etc. [89]. Could this cause the deviations observed for land surfaces? Neglecting all atmospheric effects, we can estimate the deviation of the surface emissivity corresponding to the deviation in the simulated brightness temperatures using Planck's law. For the wavelength λ, let the measured brightness temperature BT λ, m be related to an emissivity λ, m and the simulated one BT λ, s to λ, s . Their ratio r is with c = 0.0145 m K [108]. For BT λ, m between 263 K and 303 K and BT λ, m − BT λ, s = ±4 K, the difference |1 − r| is ca. 0.1, 0.08 and 0.07 for λ of 8.7 µm, 10.8 µm and 12 µm, respectively. For BT 8.7 , such deviations are possible, as the spread in emissivities of typical surfaces is large in the corresponding spectral regime (sand has emissivities down to 0.7, whereas water and vegetated surfaces have values close to 1). Around 10.8 µm and 12 µm, the differences in typical surface emissivities are <0.05 [88,89]. Therefore, it seems unlikely that an error in the surface emissivity is the single cause for the differences in the brightness temperatures above land surfaces, as their median absolute deviation in the three window channels are similar.
Another possible reason might be inaccuracies in the skin temperature. For instance, it is known that ECMWF underestimates skin temperatures for land at daytime and overestimates during night: Trigo et al. [137] found errors of the order of 2 to 5 K, especially for semiarid regions (North Africa, Sahara and Namibia), and Johannsen et al. [138] found in parts underestimations in the order of 5 K for the Iberian peninsula when comparing ECMWF data with satellite retrievals. The ECMWF data and the surface emissivity have a spatial resolution of 0.25°, corresponding to roughly 28 km at the SEVIRI sub-satellite point. However, SEVIRI itself has a resolution of 3 km at nadir. Due to this difference, small-scale features and sudden changes in the surface type (e.g., at coastlines) can lead to inaccuracies.
To sum up, the comparison indicates that gas profiles are reproduced correctly on a single pixel basis, similar to sea surfaces in the absence of clouds. For cloudy samples, the measured and simulated brightness temperature distributions agree well, but considering individual pixels we find notable deviations. Land surfaces lead to deviations for both, single pixels (<4 K) and the brightness temperature distributions. Especially the good agreement of the distributions of BTD 8.7−10.8 and BTD 10.8−12.0 highlights that relative deviations within single spectra are small, with only a minor positive bias for BTD 10.8−12.0 . Thus, we conclude that the simulations can be used as a training dataset.

Training, Validation and Test Data
We performed simulations for~30 million samples. Then, two selections are made: First, for ash-loaded samples, only those that have BTD 10.8−12 < 0 are kept. This threshold criterion is typical for volcanic ash detection and has been used for VADUGS as well [71]. It reduces the amount of samples with overlapping signals from meteorological and volcanic ash clouds, thereby effectively making the classification task slightly simpler for the ANNs. Second, the selection of ash types similar to Eyjafjallajökull ash is applied to reduce the complexity of the training datasets. From the remaining data, two sets are formed: the full dataset (Dataset A) and a subset containing only the ash-loaded samples (Dataset B). Dataset A is used for the ANNs for classification and the retrieval of τ 10.8 , i.e., the algorithms that are applied to all satellite measurements. Dataset B is used for the ANNs of z top and r eff , which are only applied to ash-loaded pixels. Dataset A and B are randomly grouped into a training (70%), a validation (20%) and a test (10%) dataset, as shown in Table 1. Training and validation datasets are used for the training of the ANNs; the test dataset is used to characterize the final algorithms in Piontek et al. [80]. Distributions of the target values in the training datasets are sketched in Figure 12. Note that they are not uniformly distributed due to the selections performed as well as the usage of different volcanic ash types.

Training of the ANNs
For the ANNs, TensorFlow version 1.14.0 [139,140] and Keras version 2.3.1 [141] are used. Individual ANNs are trained for the classification, the retrieval of volcanic ashinduced optical depths at 10.8 µm (τ 10.8 ), cloud top heights (z top , in meters) and effective particle radii (r eff , in micrometers). The classification ANN differentiates four categories: clear sky; only meteorological clouds; only volcanic ash clouds; both meteorological and volcanic ash clouds.
VADUGS directly retrieved m col , whereas VACOS derives τ 10.8 as this quantity is more closely related to the observational data as SEVIRI measures radiances. It can be converted into m col using k 10.8 . The wavelength 10.8 µm was chosen (as in [66]) as it corresponds to one of the SEVIRI channels, is in the atmospheric window, is less influenced by H 2 O and volcanic SO 2 emissions (compared to 8.7 µm, see [29]) and experiences relatively large extinctions (compared to 12 µm). The ANNs for the classification and τ 10.8 are trained with the full training dataset (Dataset A in Table 1), whereas the ANNs for z top and r eff are trained only with ash-containing samples (Dataset B). The input features (in Table 2) contain the seven infrared brightness temperatures from SEVIRI, including BT 10.8 and BT 12 that are often used for volcanic ash detection [24,25], as well as BT 8.7 [31]. Prata and Grant [33] showed that BTD 8.7-12 can be even more negative than BTD 10.8-12 . As water vapor can hide the volcanic ash [27,43], BT 6.2 and BT 7.3 are included, which are sensitive to water vapor [69]. Similarly, BT 9.7 and BT 13.4 are included to treat O 3 and CO 2 [69]. From ECMWF, the skin temperature is included as a reference for the temperature profile, as well as estimates of the total column water, water vapor and ozone to account for the influence of gases and meteorological clouds on the satellite measurements and thereby to extract the impact of the volcanic ash. Latitude and longitude allow the ANNs to learn the geography to some extent and latitudinal dependencies of the atmospheric profile. The land/sea-mask partly encodes the very different emissivities [88,89] as well as differences of the atmosphere and cloud layers above land and sea. Day of year and hour of day are included to consider seasonal and diurnal variations in the atmospheric properties, respectively. Their sine and cosine are used to avoid discontinuities (e.g., between 31 December and 1 January or 24:00 and 0:00) and make the encoding unambiguous [78]. The satellite viewing zenith angle is included to correct slant observations, leading to longer optical paths through the atmosphere for higher θ vza . τ 10.8 is included as an input feature for the ANNs of z top and r eff to make use of the previously retrieved information. Finally, the "clear" brightness temperatures BT 8.7, clr , BT 10.8, clr and BT 12, clr are given as input. Those correspond to the brightness temperatures that would be measured in absence of the volcanic ash clouds (but with meteorological clouds if present) to quantify the surroundings [27,31,45].  × × Input and output data are standardized. As τ 10.8 is the result of a previous retrieval and BT λ, clr is estimated from the surrounding satellite measurements for application to real data (Section 6), those values are prone to errors. Therefore, the ANNs for z top and r eff apply a Gaussian noise with standard deviation of 0.1 on the input layer during training. Each ANN consists of three hidden layers with 100 neurons each. This choice is motivated by Strandgren et al. [78], who investigated different ANN structures when developing CiPS, i.e., a retrieval similar to VACOS but for cirrus clouds. They found the best performance for their most complex ANN, having three hidden layers with 64 neurons each. The output layers of our ANNs have a single neuron for regressions or four neurons for the classification. Note that the ANNs now have roughly twice as many trainable parameters as VADUGS: For example, an ANN with 19 inputs, 3 × 100 hidden neurons and a single output has 22,301 parameters, whereas the VADUGS architecture with 17 inputs, 1 × 600 hidden neurons and 2 outputs has 12,002 free parameters. The hidden layers use the hyperbolic tangent as activation function, while the output layer uses a linear function for regressions and a softmax function for classification [142]. The last allows the classification ANN to produce a normalized four-dimensional output vector, where each component can be roughly interpreted as the probability of the corresponding category. All neurons use bias neurons and are initialized by a LeCun normal distribution [87]. The mean squared error is utilized as loss function for regressions and the categorical cross entropy for classifications [142]. For the training of classification, z top and r eff retrievals, all samples are weighted equally (with 1) when calculating the total loss of a batch of samples. For the ANN of τ 10.8 , the following sample weighting with respect to the true τ 10.8 is applied to increase the importance of the few samples with small τ 10.8 (e.g., less than 3% of the samples in the training subset of Dataset A have τ 10.8 < 0.6) in comparison to the many ash-free and very ash-loaded samples: The weight for τ 10.8 ≤ 0.001 is introduced to reduce the focus on very low concentrations that might be hardly retrievable anyway. With the weights for 0.001 < τ 10.8 ≤ 0.5 the focus on usual concentrations is increased. To avoid a general overestimation due to the many samples with large τ 10.8 (see Figure 12), lower weights for τ 10.8 > 0.5 are applied.
The numerical values are determined by testing different settings. The Nadam (short for Nesterov-accelerated Adaptive Moment Estimation) algorithm is applied during the training with recommended parameters (learning rate = 0.001, β 1 = 0.9 and β 2 = 0.999 [143,144]), with batch sizes of 1000. For regressions, the learning rate is reduced by a factor 100 every 500 epochs, with 2000 epochs in total, reaching a minimum of the loss function evaluated on the validation dataset, with the loss function remaining constant for several hundred epochs. For the classification ANN, the learning rate is reduced once after 500 epochs and training is stopped after 60,000 epochs, as accuracy (assuming a threshold of 0.5 for the four categories) calculated for the validation dataset decreases by <0.065% in the last 10,000 epochs.
Reality (i.e., atmospheric and volcanic ash properties) is very complex and variable and has only been approximated by the simulated data. A possible pitfall of this approach is that ANNs might overfit with respect to both the simulated training and the validation data. For example, some of the first ANNs we created here were trained successfully considering the simulated datasets, but revealed weaknesses when applied to real satellite data, such as misinterpretation of meteorological clouds as volcanic ash or wrong retrievals for thin ash clouds. To overcome these issues, we applied different selections with respect to the simulated datasets when training the final ANNs and introduced Gaussian noise layers and sample weightings as outlined above.

Notes on the Application
To perform the VACOS retrieval, the ANNs are applied in two steps: First, the classification ANN is used to detect volcanic ash. Alternatively, τ 10.8 is retrieved to find ash clouds; reasonable thresholds are found in [80]. Second, r eff and z top are retrieved for all ash-containing pixels. In order to apply the ANNs, the input features as given in Table 2 have to be composed. For the training, τ 10.8 and BT λ, clr at wavelength λ are simulated and, therefore, exact. However, for the application on satellite data, the input feature τ 10.8 is obtained from the retrieval result of the corresponding ANN, whereas BT λ, clr is estimated from the SEVIRI images. We assume that the ash clouds are spatially limited and the highest BT λ in the close surrounding of a specific pixel corresponds to the value that would be measured in absence of the volcanic ash cloud [115]. Therefore, for each pixel, the maximum BT λ within a radius of 12 pixels is determined, denoted BT 12 px λ ; similar pixel areas were considered by Krebs et al. [115]. Assuming a pixel size of 3 km, a surrounding of at least 36 km would be considered, which is sufficient for ash plumes close to the volcano [19]. At greater distance, ash clouds can become wider [6]; thus, an additional step is implemented. The SEVIRI disc is split in 10 × 10 boxes and the maximum of BT 12 px λ of all presumably ash-free pixels, i.e., pixels with is determined within each box, called BT ref λ . Now, for each pixel in the box, the brightness temperature difference is checked: if Equation (3) is not fulfilled, which indicates that volcanic ash still influences the measurement, the replacement is conducted. This last step is repeated twice in case Equation (3) remains unfulfilled. The final BT 12 px λ is used as an approximation for BT λ, clr . A uniform filter of size 5 × 5 pixels is applied for both BT λ, clr and the retrieved τ 10.8 (see [80]). Similarly, the total column water vapor and total column water are taken from an external source, e.g., a model, and might be arbitrarily wrong. However, the training dataset includes for a specific atmospheric profile samples with and without meteorological clouds (if those are theoretically possible due to the ECMWF data), both with the same total column quantities. Thus, the ANNs learn a certain robustness with respect to inaccuracies of the total column water and total column water vapor. This is different for the total column ozone and skin temperature, which are also obtained externally. The preferred source is ECMWF ERA5 as this was also used to create the training data.
The results of the ANNs can be treated in different ways. The classification result can either be used directly or a binary ash flag (i.e., ash or no ash) can be calculated by adding the probabilities of the two ash-free and the two ash-containing categories, respectively. Using the retrieved z top and z ext = 0.4z top [125,126], one can derive an estimate for the geometrical thickness [127]. The retrieved τ 10.8 can be converted into m col using k 10.8 ; typical values for different x s and r eff are given in Table 3 as derived from all data in Figure 6. Generally, k 10.8 increases with increasing x s when r eff = const., except for ashes with a r eff = 0.6 µm, where the opposite is the case. For x s = const., k 10.8 is largest in the case r eff = 1.8 µm and decreases with increasing r eff . In practice, to determine k 10.8 , one needs to know r eff and x s . The former can be approximated by satellite retrievals, the age of an ash cloud or the distance to its source. The latter can be estimated using previous knowledge about the volcanic source or laboratory analyses of volcanic ash samples when considering past eruptions. If none of this information is given, one can simply assume k 10.8 = 200 m 2 kg −1 which produces roughly the mean m col considering the extremal values 140 m 2 kg −1 and 328 m 2 kg −1 of the mean k 10.8 in Table 3. For the visible spectrum, mass extinction coefficients of 690 m 2 kg −1 at λ = 532 nm [145] and 637 m 2 kg −1 for λ = 355 nm [146] were found for Eyjafjallajökull volcanic ash. As the refractive index exhibits only a small variability in the visible spectrum with respect to the chemical composition [75,147], these mass extinction coefficient estimates can also be assumed to be representative for other types of volcanic ash. Thus, the optical depth of a volcanic ash cloud is roughly 3.3 times larger in the visible spectrum when comparing to a typical k 10.8 of 200 m 2 kg −1 at 10.8 µm. Finally, combining m col and z ext allows estimating the average mass volume concentration c.
VACOS can be applied quickly: processing a full MSG/SEVIRI image (3712 × 3712 pixels) on an ordinary desktop personal computer (using a single core of an Intel Core i5-6600 CPU at 3.30 GHz) with an off-the-shelf, uncustomized version of TensorFlow, each ANN needs~70 s, excluding input/output processing. Performance could be increased by restricting the retrieval, e.g., only τ 10.8 could be retrieved for the full disc to detect volcanic ash, whereas z top and r eff could be determined for ash-contaminated pixels only. Thus, it is possible to use VACOS operationally.

Conclusions
The new retrieval algorithm VACOS (Volcanic Ash Cloud properties Obtained from SE-VIRI) is introduced. It derives a pixel classification, cloud top height, effective particle radius and (indirectly) the mass column concentration, each of which is done individually by a shallow artificial neural network. The artificial neural networks receive seven brightness temperatures of the infrared channels of the geostationary instrument MSG/SEVIRI as well as auxiliary data from ECMWF. Using MSG/SEVIRI allows for a comparably high temporal and spatial resolution of the retrievals. Focusing on the infrared spectrum allows the application at day and night. After the time-consuming creation of a radiative transfer simulated training dataset and the training itself, artificial neural networks are fast [64,78], have good generalization skills [52] and a high robustness with respect to perturbations [67,68]. The training dataset is of main importance for the development of artificial neural networks. Here, we perform one-dimensional radiative transfer calculations for a large set of typical atmospheric conditions with and without generic volcanic ash clouds. The radiative transfer's input data are described and the central aspects discussed, in particular pointing out the strong dependence of surface emissivities on the surface type and the viewing zenith angle, the significant variability of the atmospheric state between different years and the need for a high temporal resolution to also cover its diurnal variability. A special focus is put on the representation of the volcanic ash clouds. Macrophysical properties are reviewed, and microphysical and optical properties are received from Piontek et al. [23]. The usage of a large set of refractive indices representing different volcanic ash types with respect to their silica content and glass-to-crystal ratio is a major difference to most other artificial neural network-based volcanic ash retrievals using passive imagers: they either rely on a single or a handful of volcanic ash refractive indices or use training datasets consisting of only a few different volcanic ash clouds. We perform a validation of the ash-free radiative transfer calculations by comparing those with real MSG/SEVIRI measurements for a specific scene. An overall agreement of the statistical distributions of the brightness temperatures is found, showing that the composed atmospheric and surface data are representative for the real world. Comparing simulations and measurements on a single pixel basis, we find indications that atmospheric gas profiles and sea surface emissivities are reproduced with a high agreement. For cloudy samples, the measured and simulated brightness temperature distributions agree, but considering individual pixels significant deviations are found (with 95th percentiles of the absolute deviations >30 K), most likely introduced by a random element in the implementation of the maximum-random overlap configuration in a one-dimensional atmosphere and possible different locations of clouds in the model and reality. Land surfaces lead to large deviations, for single pixels (with median absolute deviations >3 K) as well as for the brightness temperature distributions, likely due to inaccurate skin temperatures. Finally, we describe the architecture of the different artificial neural networks and their technical setup, input features, training and application.
Our work can be extended in different directions: the validation of the ash-free simulations shows that the representation of land surfaces is not fully realistic and could be improved, e.g., by using other model results or satellite retrieved skin temperatures. Volcanic ash plumes close to the vent often carry SO 2 [39,42] or ice water [148][149][150], both of which are neglected for reasons of simplicity. Other aerosols are also noteworthy, especially mineral dust which has similar refractive indices as volcanic ash due to their silica contents [11]. Including both in the training data, it might be possible to train an algorithm to separate volcanic ash from dust. The artificial neural networks could be improved by using additional input data, e.g., model-based vertical temperature profiles [68] or extremal or average brightness temperatures of the surrounding of a pixel [78]. The latter would require the simulation of extended areas, i.e., images of volcanic ash clouds in different atmospheric and surface settings, which would also enable the use of convolutional neural networks for image recognition [151]. As other geostationary passive imagers have similar spectral channels as MSG/SEVIRI [69,83,85,86], the algorithm might be transferable to these instruments [152]. A thorough validation of VACOS is presented in a companion paper [80]. An operational application by the German weather service (DWD) is ongoing.
To conclude, the new volcanic ash retrieval allows detecting and monitoring volcanic ash clouds above Europe, Africa and the Atlantic with high spatial and temporal resolution, enabling volcanic ash-related scientific investigations as well as aviation securityrelated applications.  Data Availability Statement: Data presented in this study is available on request from the corresponding author.