Title
Review Helpfulness to Support Business: Identifying Fake Reviews from User-Generated Content Using Random Forest
Authors
Abstract
Purpose: Valid and Helpful reviews on an e-commerce platform provide important information regarding customers’ perception of a product, which is crucial to the existence and growth of any business. False reviews, which are created to tarnish a product’s image through spam fraudulently, continue to be a significant challenge for all e-commerce platforms. Another challenge remains in identifying helpful review content on the platform that can significantly alter a customer’s opinion of a product. Hence, the increasing prevalence of fake and unhelpful reviews compromises the credibility of online reviews, resulting in information overload and a misleading consumer decision-making process. Motivated by this challenge, this study aims to develop an automated system capable of retaining only applicable and valid reviews to support the identification of customer needs, which is a valuable area of research.
Design/methodology/approach: This study involves three main aspects: helpfulness classification, fake review detection, and topic identification on various categories of the Amazon Dataset. The model leveraged a feature set that included the sentiment polarity of the review in detail, word count indicating the length of feedback, word diversity in the review, comprehension analysis of parts of speech in the review reflecting its grammatical structure and complexity, and authenticity metrics. Moreover, for helpful review classification, the utilized features included review and product metadata, review content informativeness score encoded with the help of Sentence Bidirectional Encoder Representations from Transformers (SBERT), and reviewer attributes. A topic extraction model has been implemented that leverages Gemini to extract sentiment-based topic analysis over reviews.
Findings: The study provides useful reviews classification over 6 different Amazon categories using a Random Forest classifier (RFC) by achieving 94% accuracy, precision, and F1-Score, a recall of 93%, and an AUC Score of 98%. While the Gradient Boosting classifier yielded comparable performance with an AUC Score of 98% and 94% accuracy, precision, recall, and F1-Score. For fake reviews detection in the Toys and Games category, the RFC achieved 85% accuracy, 86% precision, a 97% recall, 91% F1-Score, and 79% AUC Score. The findings indicate that combining textual, semantic, reviewer, and product-level features can improve the reliability of review quality assessment. Finally, to enhance the decision-making process for businesses, a topic extraction model utilizing the Gemini tool has been employed to extract significant topics from valid and helpful reviews, categorizing them separately into negative and positive reviews, thereby gaining nuanced insights into customer feedback.
Originality/value: Unlike prior studies that either examine review helpfulness or fake review detection in isolation, this study moves beyond single-task and small-sample-based approaches. Our proposed framework offers a comprehensive analysis of patterns in reviews across e-commerce platforms, thereby enhancing brands’ ability to integrate customer needs and expectations into future marketing communications and advertising campaigns. This study contributes to Decision Sciences by proposing a data-driven two-stage framework that retains only helpful and valid reviews to enhance content quality, thereby practically supporting better decision-making by content moderation, reducing information overload, and improving consumer trust in reviews.
Keywords
Text Similarity, BERT Embeddings, Fake Reviews, Machine Learning, Review Helpfulness, Random Forest Classifier
Classification-JEL
L81, O33, L86, C38, C45
Special Issue On:
Contemporary Management Research
Pages
114-156
How to Cite
Qazmi, S. I. A., Chakkaravarthy, M., Raza, S. H., Aslam, F., Aslam, S., & Iftikhar, M. (2026). Review Helpfulness to Support Business: Identifying Fake Reviews from User-Generated Content Using Random Forest. Advances in Decision Sciences, 30(2), 114-156.
