elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Language Model CNN-driven similarity matching and classification for HTML-embedded Product Data

Borst, Janos and Koerner, Erik and Opasjumruskit, Kobkaew and Niekler, Andreas (2020) Language Model CNN-driven similarity matching and classification for HTML-embedded Product Data. ISWC 2020, Online.

[img] PDF
231kB

Abstract

The Semantic Web Challenge Mining the Web of HTMLembedded Product Data aims to benchmark current technologies on the data integration tasks (1) product matching and (2) product classification, as recent years have seen significant use of semantic annotations in the e-commerce domain, but often with inconsistencies, no complete coverage or conflicting information. We introduce a transformer-based approach for textual product matching and extend it with an CNN for product classification. We compare the influence of different input feature combinations against prediction performance and introduce a technique to augment the classification task with additional information. We are able to outperform baseline results using text-only approaches.

Item URL in elib:https://elib.dlr.de/136247/
Document Type:Conference or Workshop Item (Speech)
Title:Language Model CNN-driven similarity matching and classification for HTML-embedded Product Data
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iD
Borst, JanosLeipzig University, Faculty of Mathematics and Computer Science, Institute of Computer Sciencehttps://orcid.org/0000-0002-9166-4069
Koerner, ErikLeipzig University, Faculty of Mathematics and Computer Science, Institute of Computer Sciencehttps://orcid.org/0000-0002-5639-6177
Opasjumruskit, KobkaewKobkaew.Opasjumruskit (at) dlr.dehttps://orcid.org/0000-0002-9206-6896
Niekler, AndreasLeipzig University, Faculty of Mathematics and Computer Science, Institute of Computer Sciencehttps://orcid.org/0000-0002-3036-3318
Date:November 2020
Refereed publication:No
Open Access:Yes
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:No
Status:Accepted
Keywords:product matching · product category classification · language models · natural language processing · text mining · deep learningproduct matching product category classification language models natural language processing text mining deep learning
Event Title:ISWC 2020
Event Location:Online
Event Type:international Conference
HGF - Research field:other
HGF - Program:other
HGF - Program Themes:other
DLR - Research area:no assignment
DLR - Program:no assignment
DLR - Research theme (Project):no assignment
Location: Jena
Institutes and Institutions:Institute of Data Science > Secure Digital Systems
Institute of Data Science > Smart Systems for Digitalization
Deposited By: Opasjumruskit, Kobkaew
Deposited On:25 Sep 2020 09:14
Last Modified:25 Sep 2020 09:14

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Copyright © 2008-2017 German Aerospace Center (DLR). All rights reserved.