Improving the Accuracy of Social Media Sentiment Classification with the Combination of TF-IDF Method and Random Forest Algorithm

Authors

  • Siti Mutmainah Universitas Muhammadiyah Bima Author
  • Fathir Universitas Muhammadiyah Bima Author
  • Erin Eka Citra Universitas Lampung Author

DOI:

https://doi.org/10.63866/journix.v1i1.2

Keywords:

TFIDF, Random Forest, Text Classification, Sentiment Analysis, Social Media

Abstract

Sentiment classification on social media text data is one of the main challenges in public opinion analysis. The large volume of data and the diversity of informal languages make sentiment analysis a challenge in itself, especially in the context of Indonesian. This research aims to improve the accuracy of social media sentiment classification by combining Term Frequency-Inverse Document Frequency (TF-IDF) method as a text representation technique and Random Forest algorithm as a classification model. The dataset used consists of 20,000 Indonesian opinion data collected from Twitter and Instagram, and has been labeled into three sentiment categories: positive, negative, and neutral. This data went through a preprocessing stage, including text cleaning, tokenization, stopword removal, stemming, and normalization. Experimental results show that the combination of TF-IDF and Random Forest yields an accuracy of 91.2% with average precision, recall, and F1-score values above 0.90. The confusion matrix analysis revealed that the model was highly effective in classifying positive and negative sentiments, although there were challenges in distinguishing neutral sentiments. These findings indicate that the approach used is quite reliable and can be used as a foundation for the development of sentiment analysis systems on an industrial scale as well as further research.

Downloads

Published

2025-04-30

Issue

Section

Articles