Borst, Janos and Koerner, Erik and Opasjumruskit, Kobkaew and Niekler, Andreas (2020) Language Model CNN-driven similarity matching and classification for HTML-embedded Product Data. In: 1st Semantic Web Challenge on Mining the Web of HTML-Embedded Product Data, MWPD 2020. ISWC 2020, 2020-11-02 - 2020-11-06, Online. ISSN 1613-0073.
PDF
231kB |
Abstract
The Semantic Web Challenge Mining the Web of HTMLembedded Product Data aims to benchmark current technologies on the data integration tasks (1) product matching and (2) product classification, as recent years have seen significant use of semantic annotations in the e-commerce domain, but often with inconsistencies, no complete coverage or conflicting information. We introduce a transformer-based approach for textual product matching and extend it with an CNN for product classification. We compare the influence of different input feature combinations against prediction performance and introduce a technique to augment the classification task with additional information. We are able to outperform baseline results using text-only approaches.
Item URL in elib: | https://elib.dlr.de/136247/ | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Document Type: | Conference or Workshop Item (Speech) | ||||||||||||||||||||
Title: | Language Model CNN-driven similarity matching and classification for HTML-embedded Product Data | ||||||||||||||||||||
Authors: |
| ||||||||||||||||||||
Date: | November 2020 | ||||||||||||||||||||
Journal or Publication Title: | 1st Semantic Web Challenge on Mining the Web of HTML-Embedded Product Data, MWPD 2020 | ||||||||||||||||||||
Refereed publication: | No | ||||||||||||||||||||
Open Access: | Yes | ||||||||||||||||||||
Gold Open Access: | No | ||||||||||||||||||||
In SCOPUS: | Yes | ||||||||||||||||||||
In ISI Web of Science: | No | ||||||||||||||||||||
ISSN: | 1613-0073 | ||||||||||||||||||||
Status: | Accepted | ||||||||||||||||||||
Keywords: | product matching · product category classification · language models · natural language processing · text mining · deep learningproduct matching product category classification language models natural language processing text mining deep learning | ||||||||||||||||||||
Event Title: | ISWC 2020 | ||||||||||||||||||||
Event Location: | Online | ||||||||||||||||||||
Event Type: | international Conference | ||||||||||||||||||||
Event Start Date: | 2 November 2020 | ||||||||||||||||||||
Event End Date: | 6 November 2020 | ||||||||||||||||||||
HGF - Research field: | other | ||||||||||||||||||||
HGF - Program: | other | ||||||||||||||||||||
HGF - Program Themes: | other | ||||||||||||||||||||
DLR - Research area: | no assignment | ||||||||||||||||||||
DLR - Program: | no assignment | ||||||||||||||||||||
DLR - Research theme (Project): | no assignment | ||||||||||||||||||||
Location: | Jena | ||||||||||||||||||||
Institutes and Institutions: | Institute of Data Science > Secure Digital Systems Institute of Data Science > Smart Systems for Digitalization | ||||||||||||||||||||
Deposited By: | Opasjumruskit, Kobkaew | ||||||||||||||||||||
Deposited On: | 25 Sep 2020 09:14 | ||||||||||||||||||||
Last Modified: | 13 Nov 2024 15:15 |
Repository Staff Only: item control page