elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Language Model CNN-driven similarity matching and classification for HTML-embedded Product Data

Borst, Janos and Koerner, Erik and Opasjumruskit, Kobkaew and Niekler, Andreas (2020) Language Model CNN-driven similarity matching and classification for HTML-embedded Product Data. In: 1st Semantic Web Challenge on Mining the Web of HTML-Embedded Product Data, MWPD 2020. ISWC 2020, 2020-11-02 - 2020-11-06, Online. ISSN 1613-0073.

[img] PDF
231kB

Abstract

The Semantic Web Challenge Mining the Web of HTMLembedded Product Data aims to benchmark current technologies on the data integration tasks (1) product matching and (2) product classification, as recent years have seen significant use of semantic annotations in the e-commerce domain, but often with inconsistencies, no complete coverage or conflicting information. We introduce a transformer-based approach for textual product matching and extend it with an CNN for product classification. We compare the influence of different input feature combinations against prediction performance and introduce a technique to augment the classification task with additional information. We are able to outperform baseline results using text-only approaches.

Item URL in elib:https://elib.dlr.de/136247/
Document Type:Conference or Workshop Item (Speech)
Title:Language Model CNN-driven similarity matching and classification for HTML-embedded Product Data
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Borst, JanosLeipzig University, Faculty of Mathematics and Computer Science, Institute of Computer Sciencehttps://orcid.org/0000-0002-9166-4069UNSPECIFIED
Koerner, ErikLeipzig University, Faculty of Mathematics and Computer Science, Institute of Computer Sciencehttps://orcid.org/0000-0002-5639-6177UNSPECIFIED
Opasjumruskit, KobkaewUNSPECIFIEDhttps://orcid.org/0000-0002-9206-6896UNSPECIFIED
Niekler, AndreasLeipzig University, Faculty of Mathematics and Computer Science, Institute of Computer Sciencehttps://orcid.org/0000-0002-3036-3318UNSPECIFIED
Date:November 2020
Journal or Publication Title:1st Semantic Web Challenge on Mining the Web of HTML-Embedded Product Data, MWPD 2020
Refereed publication:No
Open Access:Yes
Gold Open Access:No
In SCOPUS:Yes
In ISI Web of Science:No
ISSN:1613-0073
Status:Accepted
Keywords:product matching · product category classification · language models · natural language processing · text mining · deep learningproduct matching product category classification language models natural language processing text mining deep learning
Event Title:ISWC 2020
Event Location:Online
Event Type:international Conference
Event Start Date:2 November 2020
Event End Date:6 November 2020
HGF - Research field:other
HGF - Program:other
HGF - Program Themes:other
DLR - Research area:no assignment
DLR - Program:no assignment
DLR - Research theme (Project):no assignment
Location: Jena
Institutes and Institutions:Institute of Data Science > Secure Digital Systems
Institute of Data Science > Smart Systems for Digitalization
Deposited By: Opasjumruskit, Kobkaew
Deposited On:25 Sep 2020 09:14
Last Modified:13 Nov 2024 15:15

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.