SAR-HDP: Non-parametric Topic Model for Aspect categorisation based on online reviews
- By Omar Mustafa AL-Janabi, Nurul Hashimah Ahamed Hassain Malim, Cheah Yu-N, Osamah Mohammed Alyasiri, Aseel Musa Jasim - 19 Mar 2025
- Current Studies on Probability and Statistics, Volume: 1, Pages: 1 - 29
Abstract/Preface
Aspect categorisation and its utmost importance in the field of Aspect- based Sentiment Analysis (ABSA) has encouraged researchers to improve topic model performance for modelling the aspects into categories. In general, a major- ity of its current methods implement parametric models requiring a pre-determined number of topics beforehand. However, this is not efficiently undertaken with unan- notated text data as they lack any class label. Therefore, the current work pre- sented a novel non-parametric model drawing a number of topics based on the semantic association present between opinion-targets (i.e., aspects) and their re- spective expressed sentiments. The model incorporated the Semantic Association Rules (SAR) into the Hierarchical Dirichlet Process (HDP), named (‘SAR-HDP’). The phrase-based (or aspect-based) Bayesian model (SAR-HDP) did not consider the word’s sentence being drawn from a single topic due to the presence of mul- tiple aspects in a single review, which belonged to a multiple-aspect topic (i.e., category). Beyond its consideration of the semantic information for aspect iden- tification, the proposed model further upheld the semantic information discerned between the drawn topics and aspects identified to maintain topic consistency. Empirical investigation showed that the approach positioned successfully outper- formed standard parametric models and nonparametric models in terms of aspect categorisation when subjected to restaurant and hotel reviews sourced from Ama- zon and Trip Advisor.