Borst, Janos und Koerner, Erik und Opasjumruskit, Kobkaew und Niekler, Andreas (2020) Language Model CNN-driven similarity matching and classification for HTML-embedded Product Data. ISWC 2020, 2020-11-02 - 2020-11-06, Online.
PDF
231kB |
Kurzfassung
The Semantic Web Challenge Mining the Web of HTMLembedded Product Data aims to benchmark current technologies on the data integration tasks (1) product matching and (2) product classification, as recent years have seen significant use of semantic annotations in the e-commerce domain, but often with inconsistencies, no complete coverage or conflicting information. We introduce a transformer-based approach for textual product matching and extend it with an CNN for product classification. We compare the influence of different input feature combinations against prediction performance and introduce a technique to augment the classification task with additional information. We are able to outperform baseline results using text-only approaches.
elib-URL des Eintrags: | https://elib.dlr.de/136247/ | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Konferenzbeitrag (Vortrag) | ||||||||||||||||||||
Titel: | Language Model CNN-driven similarity matching and classification for HTML-embedded Product Data | ||||||||||||||||||||
Autoren: |
| ||||||||||||||||||||
Datum: | November 2020 | ||||||||||||||||||||
Referierte Publikation: | Nein | ||||||||||||||||||||
Open Access: | Ja | ||||||||||||||||||||
Gold Open Access: | Nein | ||||||||||||||||||||
In SCOPUS: | Nein | ||||||||||||||||||||
In ISI Web of Science: | Nein | ||||||||||||||||||||
Status: | akzeptierter Beitrag | ||||||||||||||||||||
Stichwörter: | product matching · product category classification · language models · natural language processing · text mining · deep learningproduct matching product category classification language models natural language processing text mining deep learning | ||||||||||||||||||||
Veranstaltungstitel: | ISWC 2020 | ||||||||||||||||||||
Veranstaltungsort: | Online | ||||||||||||||||||||
Veranstaltungsart: | internationale Konferenz | ||||||||||||||||||||
Veranstaltungsbeginn: | 2 November 2020 | ||||||||||||||||||||
Veranstaltungsende: | 6 November 2020 | ||||||||||||||||||||
HGF - Forschungsbereich: | keine Zuordnung | ||||||||||||||||||||
HGF - Programm: | keine Zuordnung | ||||||||||||||||||||
HGF - Programmthema: | keine Zuordnung | ||||||||||||||||||||
DLR - Schwerpunkt: | keine Zuordnung | ||||||||||||||||||||
DLR - Forschungsgebiet: | keine Zuordnung | ||||||||||||||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | keine Zuordnung | ||||||||||||||||||||
Standort: | Jena | ||||||||||||||||||||
Institute & Einrichtungen: | Institut für Datenwissenschaften > Sichere Digitale Systeme Institut für Datenwissenschaften > Softwaresysteme für die Digitalisierung | ||||||||||||||||||||
Hinterlegt von: | Opasjumruskit, Kobkaew | ||||||||||||||||||||
Hinterlegt am: | 25 Sep 2020 09:14 | ||||||||||||||||||||
Letzte Änderung: | 10 Jul 2024 10:26 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags