Benchmarking Machine Learning Algorithms in Fake Reviews Detection in Brazilian Portuguese

Authors

  • Eduardo C. R. Borges Federal Institute of Santa Catarina
  • Cristiano Mesquita Garcia Federal Institute of Santa Catarina
  • Português Federal University of Southern Border
  • Português Português

DOI:

https://doi.org/10.5335/rbca.v17i1.16183

Keywords:

Fake reviews, Machine learning, Classification, Natural Language Processing

Abstract

The proliferation of fake reviews has become a growing concern on e-commerce platforms, as these reviews can mislead consumers and harm the reputation of products and services offered. Automatic detection of fake reviews is a challenging task, as it requires analyzing textual data and identifying subtle patterns that indicate the veracity of reviews. Since fake review datasets in Portuguese are scarce, in this work, we generate and propose a dataset in Brazilian Portuguese for the detection of fake reviews. Then, four machine learning algorithms, combined with three text vectorization methods, are used in a transfer learning scheme for fake review classification. A comparative analysis is carried out using performance metrics such as accuracy, F1-score, and false positives. The results show that, for the proposed dataset, the combination of Logistic Regression and a pre-trained BERT model in Brazilian Portuguese, i.e., BERTimbau, reached the best metric values, reaching 96.61% of accuracy.

Downloads

Download data is not yet available.

Downloads

Published

2025-05-23

Issue

Section

Original Paper

How to Cite

[1]
2025. Benchmarking Machine Learning Algorithms in Fake Reviews Detection in Brazilian Portuguese. Brazilian Journal of Applied Computing. 17, 1 (May 2025), 12–22. DOI:https://doi.org/10.5335/rbca.v17i1.16183.