elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Refining action segmentation with hierarchical video representations

Ahn, Hyemin and Lee, Dongheui (2021) Refining action segmentation with hierarchical video representations. In: 18th IEEE/CVF International Conference on Computer Vision, ICCV 2021, pp. 16302-16310. IEEE. international Conference on Computer Vision, 2021-10-11 - 2021-10-17, Virtual. doi: 10.1109/ICCV48922.2021.01599. ISBN 978-166542812-5. ISSN 1550-5499.

[img] PDF
2MB

Official URL: https://openaccess.thecvf.com/content/ICCV2021/html/Ahn_Refining_Action_Segmentation_With_Hierarchical_Video_Representations_ICCV_2021_paper.html

Abstract

In this paper, we propose Hierarchical Action Segmentation Refiner (HASR), which can refine temporal action segmentation results from various models by understanding the overall context of a given video in a hierarchical way. When a backbone model for action segmentation estimates how the given video can be segmented, our model extracts segment-level representations based on frame-level features, and extracts a video-level representation based on the segment-level representations. Based on these hierarchical representations, our model can refer to the overall context of the entire video, and predict how the segment labels that are out of context should be corrected. Our HASR can be plugged into various action segmentation models (MS-TCN, SSTDA, ASRF), and improve the performance of state-of-the-art models based on three challenging datasets (GTEA, 50Salads, and Breakfast). For example, in 50Salads dataset, the segmental edit score improves from 67.9% to 77.4% (MS-TCN), from 75.8% to 77.3% (SSTDA), from 79.3% to 81.0% (ASRF). In addition, our model can refine the segmentation result from the unseen backbone model, which was not referred to when training HASR. This generalization performance would make HASR be an effective tool for boosting up the existing approaches for temporal action segmentation. Our code is available at https://github.com/cotton-ahn/HASR_iccv2021.

Item URL in elib:https://elib.dlr.de/147186/
Document Type:Conference or Workshop Item (Poster)
Additional Information:This work has been partially supported by the Helmholtz Association.
Title:Refining action segmentation with hierarchical video representations
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Ahn, HyeminUNSPECIFIEDhttps://orcid.org/0000-0001-8081-6023UNSPECIFIED
Lee, DongheuiUNSPECIFIEDhttps://orcid.org/0000-0003-1897-7664UNSPECIFIED
Date:October 2021
Journal or Publication Title:18th IEEE/CVF International Conference on Computer Vision, ICCV 2021
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In SCOPUS:Yes
In ISI Web of Science:Yes
DOI:10.1109/ICCV48922.2021.01599
Page Range:pp. 16302-16310
Publisher:IEEE
ISSN:1550-5499
ISBN:978-166542812-5
Status:Published
Keywords:Video Action Segmentation; Computer Vision; Deep Learning
Event Title:international Conference on Computer Vision
Event Location:Virtual
Event Type:international Conference
Event Start Date:11 October 2021
Event End Date:17 October 2021
Organizer:IEEE Computer Society
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Robotics
DLR - Research area:Raumfahrt
DLR - Program:R RO - Robotics
DLR - Research theme (Project):R - Autonomous learning robots [RO], R - Intuitive human-robot interface [RO]
Location: Oberpfaffenhofen
Institutes and Institutions:Institute of Robotics and Mechatronics (since 2013)
Deposited By: Ahn, Hyemin
Deposited On:10 Dec 2021 09:22
Last Modified:24 Apr 2024 20:45

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.