elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning

Kandala, Hitesh and Saha, Sudipan and Banerjee, Biplab and Zhu, Xiao Xiang (2022) Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning. IEEE Geoscience and Remote Sensing Letters, 19, p. 6514905. IEEE - Institute of Electrical and Electronics Engineers. doi: 10.1109/LGRS.2022.3198234. ISSN 1545-598X.

[img] PDF - Published version
1MB

Official URL: https://ieeexplore.ieee.org/document/9855519

Abstract

High-resolution remote sensing images are now available with the progress of remote sensing technology. With respect to popular remote sensing tasks, such as scene classification, image captioning provides comprehensible information about such images by summarizing the image content in human-readable text. Most existing remote sensing image captioning methods are based on deep learning-based encoder–decoder frameworks, using convolutional neural network or recurrent neural network as the backbone of such frameworks. Such frameworks show a limited capability to analyze sequential data and cope with the lack of captioned remote sensing training images. Recently introduced Transformer architecture exploits self-attention to obtain superior performance for sequence-analysis tasks. Inspired by this, in this work, we employ a Transformer as an encoder–decoder for remote sensing image captioning. Moreover, to deal with the limited training data, an auxiliary decoder is used that further helps the encoder in the training process. The auxiliary decoder is trained for multilabel scene classification due to its conceptual similarity to image captioning and capability of highlighting semantic classes. To the best of our knowledge, this is the first work exploiting multilabel classification to improve remote sensing image captioning. Experimental results on the University of California (UC)-Merced caption dataset show the efficacy of the proposed method. The implementation details can be found in https://gitlab.lrz.de/ai4eo/captioningMultilabel .

Item URL in elib:https://elib.dlr.de/192680/
Document Type:Article
Title:Exploring Transformer and Multilabel Classification for Remote Sensing Image Captioning
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Kandala, HiteshUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Saha, SudipanUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Banerjee, BiplabIndian Institute of Technology BombayUNSPECIFIEDUNSPECIFIED
Zhu, Xiao XiangUNSPECIFIEDUNSPECIFIEDUNSPECIFIED
Date:August 2022
Journal or Publication Title:IEEE Geoscience and Remote Sensing Letters
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In SCOPUS:Yes
In ISI Web of Science:Yes
Volume:19
DOI:10.1109/LGRS.2022.3198234
Page Range:p. 6514905
Publisher:IEEE - Institute of Electrical and Electronics Engineers
ISSN:1545-598X
Status:Published
Keywords:Auxiliary task, image captioning, multitask learning, remote sensing, Transformer
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Earth Observation
DLR - Research area:Raumfahrt
DLR - Program:R EO - Earth Observation
DLR - Research theme (Project):R - Artificial Intelligence
Location: Oberpfaffenhofen
Institutes and Institutions:Remote Sensing Technology Institute > EO Data Science
Deposited By: Haschberger, Dr.-Ing. Peter
Deposited On:20 Dec 2022 10:07
Last Modified:19 Oct 2023 12:38

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.