How can voting mechanisms improve the robustness and generalizability of toponym disambiguation?

Hu, Xuke und Sun, Yeran und Kersten, Jens und Zhou, Zhiyong und Klan, Friederike und Fan, Hongchao (2023) How can voting mechanisms improve the robustness and generalizability of toponym disambiguation? International Journal of Applied Earth Observation and Geoinformation. Elsevier. doi: 10.1016/j.jag.2023.103191. ISSN 1569-8432. (eingereichter Beitrag)

Dies ist die aktuellste Version dieses Eintrags.

PDF - Verlagsversion (veröffentlichte Fassung)
2MB

Kurzfassung

A vast amount of geospatial information exists in natural language texts, such as tweets and news. Extracting geospatial information from texts is called Geoparsing, which includes two subtasks: toponym recognition and toponym disambiguation, i.e., to identify the geospatial representations of toponyms. This paper focuses on toponym disambiguation, which is approached by toponym resolution and entity linking. Recently, many novel approaches have been proposed, especially deep learning-based, such as CamCoder, GENRE, and BLINK. In this paper, a spatial clustering-based voting approach combining several individual approaches is proposed to improve SOTA performance regarding robustness and generalizability. Experiments are conducted to compare a voting ensemble with 20 latest and commonly-used approaches based on 12 public datasets, including several highly challenging datasets (e.g., WikToR). They are in six types: tweets, historical documents, news, web pages, scientific articles, and Wikipedia articles, containing 98,300 places across the world. Experimental results show that the voting ensemble performs the best on all the datasets, achieving an average Accuracy@161km of 0.86, proving its generalizability and robustness. Besides, it drastically improves the performance of resolving fine-grained places, i.e., POIs, natural features, and traffic ways.

elib-URL des Eintrags: