H-sim: a hybrid similarity function for product matching

Authors

  • Higor Moreira UFRGS
  • Edimar Manica Instituto Federal de Educação, Ciência e Tecnologia do Rio Grande do Sul - Campus Ibirubá

DOI:

https://doi.org/10.5335/rbca.v16i1.14955

Keywords:

electronic invoices, product matching, semantic similarity, similarity functions.

Abstract

When a company purchases products from its suppliers, it needs to import electronic invoices for these products into its relational database to manage product inventory, taxes, and resale. This is not a trivial task, as the product descriptions in the invoices and the database vary. This paper proposes the H-sim similarity function that combines semantic similarity functions with similarity functions based on token or edit distance to identify products matching from different databases. Experiments were carried out using real product data, where the H-sim function obtained 87.7% of F1.

Downloads

Download data is not yet available.

Published

2024-05-01

Issue

Section

Original Paper

How to Cite

[1]
2024. H-sim: a hybrid similarity function for product matching. Brazilian Journal of Applied Computing. 16, 1 (May 2024), 50–63. DOI:https://doi.org/10.5335/rbca.v16i1.14955.