MACS-Mar: a real-time remote sensing system for maritime security applications

The modular aerial camera system (MACS) is a development platform for optical remote sensing concepts, algorithms and special environments. For real-time services for maritime security (EMSec joint project), a new multi-sensor configuration MACS-Mar was realized. It consists of four co-aligned sensor heads in the visible RGB, near infrared (NIR, 700–950 nm), hyperspectral (HS, 450–900 nm) and thermal infrared (TIR, 7.5–14 µm) spectral range, a mid-cost navigation system, a processing unit and two data links. On-board image projection, cropping of redundant data and compression enable the instant generation of direct-georeferenced high-resolution image mosaics, automatic object detection, vectorization and annotation of floating objects on the water surface. The results were transmitted over a distance up to 50 km in real-time via narrow and broadband data links and were visualized in a maritime situation awareness system. For the automatic onboard detection of floating objects, a segmentation and classification workflow based on RGB, IR and TIR information was developed and tested. The completeness of the object detection in the experiment resulted in 95%, the correctness in 53%. Mostly, bright backwash of ships lead to an overestimation of the number of objects, further refinement using water homogeneity in the TIR, as implemented in the workflow, couldn’t be carried out due to problems with the TIR sensor, else distinctly better results could have been expected. The absolute positional accuracy of the projected real-time imagery resulted in 2 m without postprocessing of images or navigation data, the relative measurement accuracy of distances is in the range of the image resolution, which is about 12 cm for RGB imagery in the EMSec experiment.


Introduction
Remote sensing methods have been used in maritime scenarios for many years with different scopes that can be attributed to maritime security and safety [1]. Passive optical sensors in multi-spectral or hyperspectral configurations are widely used for the monitoring of large-scale ecological issues like algal blooms, coral reef studies, or the analysis of sediment transport in estuaries [2,3]. The inclusion of thermal infrared allows for additional applications like monitoring thermal plumes of warm water discharges caused by power plants [4,5]. With the constant improvement of spatial resolution, ship detection is also possible now from satellite-based passive optical systems [6,7]. Radar and especially synthetic aperture radar (SAR) have been studied for sea state monitoring [8,9], oil spill [10] and ship detection [11,12], especially exploiting the benefits of a satellite platform regarding the vast area of interest. Also, satellitebased receivers for 'Automatic Identification System' (AIS) are under study and in experimental use [13,14].
All those sensors and methods have been tested or applied also on airborne platforms [15]. Especially securityrelated applications benefit from the feasible higher spatial 1 3 resolutions, combinations of sensors [16] and the merging with information from ground-based sensors or sensor networks [17]. Therefore, an extensive suite of instruments and methods is available for gathering information about the maritime environment.
Several of these remote sensing methods are applied today in a regular manner. German Navy operates a pollution control aircraft mainly for oil spill detection [18], several national search-and-rescue operations use helicopters equipped with multi-sensor reconnaissance payloads [19]. Also, in Germany, the main agencies with maritime security tasks have created a joint 'Maritime Safety and Security Center of the Federal Government and the Coastal States' in which information gathered by the contributing partners are shared [20].
Nonetheless, remote sensing is only scarcely and sporadically applied for maritime security challenges. Patrolling extended areas with a plane or assigning singular missions to sensor-equipped helicopters does not amount to constant, multiscale situation awareness. Relatively high effort is necessary to sustain the aforementioned solutions especially given the comparatively low risk of incidents. This high effort is a limiting factor for the establishment of persistent and comprehensive maritime monitoring system. Currently, systems are being developed to assist operators by introducing more intelligent sensors. Such sensors can be carried by remotely piloted aircrafts flying in medium and high altitudes for long endurance (future MALE and HALE) providing more cost-effective, multiscale surveillance capabilities.
Information in maritime environment is shared predominantly by direct voice communication between participants. A unified view on the situation for every partner is all but impossible. With a rising number from about 1,300 marine incidents and casualties reported to the European Maritime Security Agency in 2011 to about 3300 in 2016 [21], a combination of diverse methods to enable a robust maritime situation awareness over an extended time-frame is deemed necessary at least for regions of particular interest.
The joint project 'Echtzeitdienste für die Maritime Sicherheit-EMSec' ('Real-Time Services for Maritime Security-EMSec') proposed such an approach.A focus of the project was to assist prospective decision-making users by providing only important information. Each part of the project had to support this objective by delivering pre-processed and partly interpreted data.
The objective in the project that is discussed in this paper was the development of a special airborne camera system including processing and data deployment, which had to meet several user-defined requirements. The main aspects were to deliver 1. a high-resolution true-color overview of a confined area (georeferenced image mosaic), 2. automatically detected and annotated objects on the water surface (vector information), 3. and-as a secondary goal-areas of automatically detected water pollutions.
Every product had to be provided in real-time to an existing ground-based central situation awareness system and its human-machine interface.
It was required to work automatically or remotely controlled for the prospective use in a small, cost-efficient aircraft or unmanned aerial system. The area of operation was planned to be a maritime environment up to 12 nautical miles off the coastline. Further technical aims were the transmission of its georeferenced data products from the aircraft to a ground station over a distance of up to 50 km.

MACS-Mar instrument
The MACS camera system enables the fast and easy development of novel aerial camera concepts for special applications [24,25]. Multiple passive optical sensors can be combined to acquire the relevant information (Fig. 1). The sensors and their respective field of view can be adjusted to specific use-cases. All sensors and their optics are calibrated geometrically and for radiometric correction. The mechanical design must be rigid to allow for a precise coregistration of images taken by all sensors of the respective configuration. To efficiently evaluate such a configuration, an approach for combined photogrammetric processing of multiple sensor heads has been developed [26].
To match the specific requirements of the EMSec project and following the investigation of preliminary work [27], four optical sensor heads acquiring wavelengths from 400 nm to 14 µm were chosen (Table 1). For the interpretation by a human operator, the high-resolution RGB imagery is used. Additionally, for automatic object extraction NIR and thermal IR sensors are implemented. The hyperspectral imager is incorporated for detection of water pollution. Small objects in water like persons show an observable size of estimated 1 by 0.5 m 2 . To detect such objects significantly at least a few adjoined pixels (approximately 10-20 depending object and illumination conditions) have to be covered which yields to a required ground sampling distance (GSD) of less than 15 cm. The focal length of the RGB sensor module has been selected to achieve a GSD of 12.1 cm at an operational altitude of 820 m above sea level. This delivers a reasonable GSD/swath width relation in combination with the ability to detect small objects. Figure 1 shows the MACS-Mar remote sensing system including both narrowband and broadband data downlinks.
Image processing and recording is done by a desktop class embedded computer. This computational power is necessary to allow the simultaneous recording of various sensor data, online georeferencing and map projection of those data and implementation of suitable real-time image classification algorithms. In this way, various higher level geoinformation can be generated automatically during operation. The automatic objects of interest detection is based on co-registered image map and executed in real-time. As any pixel of the maps created has a reliable coordinate and time designation, the same applies to any detected object. By sending only detected objects to a ground station, the amount of data to be transmitted can be reduced and the amount of information to be examined by an operator decreases.
The sensor system is controlled via ground-based mission control center through a 9600 Bit per second narrowband radio link. The operator is able to monitor system healthiness, to change the configuration and to receive classification results. The current position of the aircraft and the footprints of the images taken are shown continuously on a scalable moving map. Enabling a more powerful air-to-ground link providing a data rate of 5-10 Mbps, seamless cropped images are transmitted in full geometric and radiometric resolution (Fig. 2).
The visual information can be directly interpreted by humans. In addition, object detection algorithms are applicable. Different sensors and lenses can be used which allow task-specific footprints and ground resolutions. A ground sampling distance of up to 3 cm is achievable depending on the flight altitude and optical configuration. Within the map, distances and areas can be determined, e.g., the length of a vessel or the extent of oil contamination areas.  Using calibrated camera modules and the unique assignment of position and attitude at any time, all aerial images can be projected onto an existing digital elevation model. While a higher resolution model (e.g. Shuttle Radar Topography Mission SRTM-1 or SRTM-3) is applied in coastal regions, only a geoid, for example, the earth gravitational model 1996 (EGM96), is used in offshore applications. This allows fast generation of an image map under consideration of the deviation between WGS84 reference ellipsoid and mean sea level.

Automatic object detection
One goal of the experiment is the detection of small floating objects in water. For a generally applicable method, it is very important to develop universal algorithms which are working in different environments, recording times and under changing weather and water conditions; that is a big challenge.
Besides extraction algorithms input data has a big influence on extraction results. To have a functional algorithm, it is necessary to use the special characteristics of each sensor. One useful sensor for object detection in a maritime environment is a thermal IR imager. Because of almost homogeneous water temperature and the missing effect of sun glint, a thermal image is suitable to detect objects of a certain minimum size on water with a very high accuracy. Due to the ground sampling distance (GSD) of thermal IR images (1.42 m GSD at 2500 m altitude), small objects like sea marks or persons in water cannot be extracted reliably. Common RGB and NIR sensors can provide sufficient resolution. At an altitude of 2500 m for surveillance flights, the used RGB sensor has a GSD of 0.37 m and a NIR sensor of 0.47 m (Table 1).
For the development of the algorithm, a flight altitude of 2500 m was assumed. Thus, the thermal IR images are used for the detection of objects with a size of more than 1.5 m × 2.5 m. Offshore, most objects have a larger size than 1.5 m × 2.5 m. Therefore, using thermal IR images, a bulk of objects can be detected. However, the existence of smaller objects cannot be excluded. Due to this reason, the RGB and NIR images are additionally necessary to improve the completeness of object extraction. On the one hand, the main advantage of RGB and NIR images is the higher GSD in comparison to the thermal IR images. On the other hand, maritime RGB and NIR images are mainly influenced by sun glint [22]. Sun glint is the specular reflection of sunlight from water surface into the sensor [23]. This is an enormous source of irritation and leads to incorrect object detections.
For successful object detection, the effect of sun glint has to be reduced significantly.
To discover water pollution, NIR and hyperspectral sensors are helpful. An overview of airborne sensors for water quality assessment is given in a review [28]. The proposed method to detect water quality [29] was developed by the Optical Remote Sensing of Water department at the DLR.
Based on the specific characteristics of all but hyperspectral sensors, an automatic object detection algorithm in maritime environments was developed. The algorithm is divided into five parts (Fig. 3  Test flights with MACS aerial camera showed that sun glint has a negative influence on automatic object detection. Due to the reflection and refraction of sun light on waves, many incorrect objects were detected. Reasoned by this effect, a very fast preprocessing of the images became necessary. Because of the real-time preprocessing on the camera system, the complex existing algorithms for sun glint reduction were not suitable. Therefore, a software-based opening filter [30] was used to reduce the impact of sun glint (Fig. 4a). The opening filter was applied with a 3 × 3 kernel. All objects were preserved then and the sun glint was reduced partly but not completely. A 5 × 5 kernel reduced the sun glint but very small objects as well. As Fig. 4b shows the used 3 × 3 opening filter can not remove the whole sun glint. Because of that fact sun glint still had an influence on object extraction results.
The thermal IR images have a small noise. Therefore, a median filter was used to reduce the noise.
The filtered aerial images were used for image segmentation (part II). For high-resolution images, a quadtree segmentation (Fig. 5a) was implemented due to almost homogeneous water surface and short processing time. Chessboard segmentation was used for the thermal IR images which are more homogeneous and have a lower resolution.
Based on the segments, a very simple and transferable local thresholding classification algorithm was executed to distinguish between water, sun glint and objects within high-resolution images. For every channel, the mean of the whole image was calculated (image mean) and added with a value of 8000 which was determined empirically. This value depends on light conditions and was changeable by operator during the flight. The classification is based on comparing image mean with the mean of the segments. For the object class, the blue and the red channels were used. If the mean of the segment in the blue or red channel was less than image mean, the segment was classified as an object. Following the object, segments were merged. For sun glint classification, it was assumed that the brightness of sun glint segment is higher than a water segment and that sun glint affects only small areas. If the image mean was less than the segment mean of the red, green, blue and NIR channel as well as the segment was smaller than 2 m 2 the segment was classified as sun glint. The other segments were classified as water (part III).
To distinguish between water and objects within low resolution thermal IR images in part III a standard deviation was calculated. Therefore, the 49 neighbour pixels of each pixel (three rows around center) were considered to find pixels with high contrast. It was assumed that water has homogeneous temperature and objects on the water have clear temperature difference. If the standard deviation was more than 0.5, the pixel was classified as an object pixel. All object pixels were merged to filled polygons. Small objects of less than 1.5 m × 2.5 m were removed.
In part IV, the border of detected objects was improved applying a region-and pixel-based growing algorithm. This step was necessary for the following object identification.
Object identification was implemented in the final (part V) step to distinguish between different ship types (red objects), sea marks (small green object) and undefined objects (Fig. 5d). Therefore, geometric (size and shape) and spectral properties as well as relations to neighbour objects were used. For example, a ship is an elongated object which is longer than wide and surrounded by water. The type of ship was distinguished by size ( Table 2).

Object detection accuracy
An accuracy analysis is executed to evaluate the automatic object detection accuracy. Due to the fact that outlines of an object cannot be extracted exactly in many cases, the evaluation of the accuracy for every object is a challenge. The automatically extracted objects may be too small, too big or just a sub-part of another extracted object. Hence for every object, it is necessary to decide whether extraction is correct or false. According to Egenhofer [31] eight theoretical relations between two objects are possible, divided into correct, false, and unclear cases. In the latter cases, it has to be distinguished between correct and false extracted objects. This can be estimated with the overlapping factor [32]: with OF = overlapping factor, A° = extent of object A, B° = extent of object B. The object is extracted as false if the overlapping factor in our case is equal or smaller than 0.3. This value was determined empirically during previous campaigns. An object is correctly extracted if the overlapping factor is greater than 0.3. The determination as false, correct and missed objects is executed with the overlapping factor. During the EMSec test campaign which is described in chapter IV, all recorded objects were identified and automatic object detection algorithm was applied to aerial imagery. After the identification of correct, false and missed objects, the determination of the overall accuracy is possible. To determine the accuracy, the completeness (producers accuracy) and the correctness (users accuracy) according to Straub [33] are calculated. The completeness (com) of the results is calculated as The correctness (corr) of the results is calculated as: with ceo = correctly extracted objects, neo = not extracted objects, weo = wrong extracted objects.

Direct georeferencing accuracy
Every image pixel is georeferenced to be able to determine the geographic position of any object or area of interest. Quality and reliability of this process is investigated by direct georeferencing quality analysis of the RGB images. Therefore, repeated fly-overs are performed during the test campaign to acquire images of a navigation beacon with (2) com (%) = ceo ceo + neo × 100.

Experiment
From 5th to 9th September 2016, the EMSec verification experiment was conducted over the North Sea off Cuxhaven. Different sub-experiments were performed with more than 9 h of image acquisition in mostly sunny weather condition including dusk operation at solar altitude down to 2.5°. The coast guard ship BP25 Bayreuth assisted as target (Fig. 7) and brought out drifting popcorn as well as a 0.6 m by 0.8 m dummy (Fig. 8).
Carried by the autopilot-controlled DLR research aircraft Dornier 228 (D-CODE) [37], MACS-Mar operated largely automatically. Data products were delivered continuously Fig. 6 Geographic reference navigation beacon "F" [36] a terrestrial side view photo, b top view as acquired by MACS-Mar via radio link. The two axes tracking ground station antenna was placed on a roof top approximately 25 m above ground. All geo-referencing, mosaicking and image interpretation tasks were designed to operate on board automatically, so the derived information could be directly put into a deployment system and human-machine interface.
In operation area, ships and spilled popcorn were recognized automatically in the images and assigned with a suitable description of current time, position, category and signature. The ultra-high frequency (UHF) data link carrying approximately 9600 Bit per second was integrated for downlink applications. Additionally, this bidirectional link can be used for telemetry and to remotely command the camera system during operation. Via broadband data link, a successive real-time map was built in the situation center (Fig. 2).

Results
During the 5 days experiment with daily flights, approximately 12 GB of image data were transmitted reliably in full geometric and radiometric resolution at a distance up to 50 km. Narrow band remote control worked stable at a distance more than 80 km.
Visual identification of ships has been investigated during low light flights. The ship name was not identifiable in the image due to near vertical perspective. Figure 9 shows the coast guard ship BP25 as snapshot from the ground station real-time map. The image was taken from flight altitude 820 m at a solar angle of 5.5°. Position, heading, shape and extent are determined within a single image while dynamic parameters like course and speed are measured by including adjacent images or images of a later fly-over. In the real-time map, the ship's length was repeatedly determined between 65.7 and 66.1 m, while the ship's actual length is 65.9 m. This results in a deviation of 20 cm, respectively, 1.5 pixels.
The thermal IR camera lagged by technical reasons. Thus, unfortunately these data could not be evaluated. All other sensors RGB, NIR, hyperspectral and assistive sensors like INS and temperature worked properly. Image-based detected occurrences were indicated real-time on the maritime management system and corresponding images were displayed.
Despite of acquiring more than 9 h data recording no real water pollution could be observed. The popcorn was originally used to evaluate drift forecast. On the other hand, the spectral signature is untypical for water pollution. Due to the high visibility in RGB and NIR imagery, the popcorn was automatically extracted as an object (Fig. 10).

Object detection analysis
During the campaign, 77 objects were observed. The recorded objects are ships and the popcorn film (Fig. 11). Due to the missing thermal IR images, the object extraction on the high-resolution RGB and NIR images was The completeness of the automatic object extraction algorithm was 95%. The four missing ships were not extracted because of the object extraction size threshold. The minimum size for object detection was 25 m 2 . Because of the missing thermal IR images, a smaller object size was not applicable. Too much sun glint resulted in false positives if the object size was smaller than 25 m 2 .
During the automatic object extraction, 65 objects were false extracted. According to this result, the correctness was 53%. The low correctness is explainable by the missing thermal IR information. By incorporating just RGB and NIR aerial images very often the backwash of the ships was identified as a single object (Fig. 12). Using a thermal IR image, the correctness will significantly increase as shown in preliminary [27] and following [38] work.

Direct georeferencing analysis
The navigation beacon was covered within every flight. 27 blocks of 4-5 consecutive images were identified showing the beacon. Overall, in 124 images, the position was determined and the offset to the known beacon position was calculated. The results are listed in Table 3. 85 percent of the images were taken at an aircraft bank angle of approximately 20° giving oblique perspectives (Fig. 6).
One block consisting of five images was considered outlier. Particularly, mean value showed an offset more than 14 m. While further investigating these unexpected high values, it was recognized that an inertial navigation system alignment procedure had been started shortly before the beacon was reached. Thus, the INS was not properly aligned during this single fly-over. This INS state is quite unusual in operation and therefore the particular image sequence was separated from the other measurements, see Table 3. These high values show the degrading impact on direct georeferencing quality during unhealthy INS condition. The normal quality values were raised significantly.
Tidal range compensation was not included during the current state of analysis. The max. 3 m tidal range in the area of interest yields to a horizontal projection error of approximately 1 m when picking up a pixel coordinate. This value is valid for images taken at 20° bank angle and a beacon in the image center.

Discussion
In general, the objectives of the presented sub-projectsituation map and automatic object extraction-have been successfully realized and demonstrated. As shown in the results for automatic object extraction, a combination of passive optical sensors is essential to achieve high rates of completeness and correctness. Because of the homogeneous water temperature and temperature differences between floating objects and surrounding water, high-resolution thermal infrared imagery is key information for this application. True-color RGB and NIR imagery are necessary to categorize objects and tag semantic information.  Furthermore, high-resolution RGB image data is highly beneficial for the manual interpretation by human operators. Thus, small objects like castaways can be discovered. Bigger objects like ships can be characterized by measuring the size and visual interpretation. Identification of ships by name requires high-resolution oblique views.
While the potential of the hyperspectral sensor to detect water pollution was theoretically examined, the benefit of hyperspectral information could not be empirically evaluated during the campaign because of the lack of pollution in the examined region.
Accurate and reliable positioning of the data products is important for object identification and the combination with other information sources. To achieve high accuracies, the overall sensor set-up and processing chain has to be controlled. Calibrated sensors and a healthy INS operation are a precondition. The demonstrated absolute position accuracy in real-time of approximately 2 m is sufficient for the described tasks. Also, the accurate determination of object size is helpful for classification.

Future work
A second campaign was examined in August 2017 adding thermal imagery. The paper is accepted and to be released in near future [38]. Next steps should be acquiring a greater database to make the algorithm more robust against image errors and thus avoiding detection of seemingly very small objects. Additionally, this database can be used to feed deep learning approaches. The influence of ground pixel resolution on detection accuracy has to be examined, because real-time processing on a satellite or high altitude persistent satellite (HAPS) which are appropriate for maritime surveillance applications cannot be feeded with high-resolution imagery as given here.