V8 N2 Paper 7
Annals of the MS in Computer Science and Information Systems at UNC Wilmington
Fall 2014

An Application of Knowledge Discovery in Textual Databases to Identify Sentiments in Product Reviews  

Vincent Tran

Committee

Curry Guinn (chair)
Douglas Kline
Devon Simmonds

Abstract

An application of knowledge discovery in textual databases to identify sentiments in product reviews. Tran, Vincent, 2014. Master’s Thesis, University of North Carolina Wilmington. With the massive amount of textual data available on the web, the ability to automate the extraction of patterns and meaning proves to be important for business decision makers. The Amazon ecommerce site is particularly interesting because of their massive amount of readily available textual data in the form of reviews. These reviews are often loaded with sentiments that have already been tagged by the reviewers via the 5-stars rating system. This rating system defines 5- star reviews as having the most positive sentiment and 1-star reviews as having the most negative sentiment. However, the reviewers’ ratings of a product lack the granularity required by manufacturers and business owners to answer the question: what do customers like and dislike about their products? This study explores the feasibility of two knowledge discovery tasks: topic identification and sentiment analysis in the domain of product reviews. Particularly, the study leverages supervised (Naïve Bayes and Support Vector Machine) and unsupervised machine learning techniques (Pointwise Mutual Information and Frequent Itemsets) to detect topics being talked about in the reviews and the overall sentiment towards those topics. The resulting data from the study’s experiments suggest that we can answer the question “what do customers like and dislike about the products?” with reasonable accuracy using these particular supervised and unsupervised approaches.

download (pdf)

Recommended Citation: Tran, V., Guinn, C, Kline, D., Simmonds, D. (2014) An Application of Knowledge Discovery in Textual Databases to Identify Sentiments in Product Reviews. Annals of the Master of Science in Computer Science and Information Systems at UNC Wilmington, 8(2) paper 7. http://csbapp.uncw.edu/data/mscsis/full.aspx.

V8 N2 Paper 7
Annals of the MS in Computer Science and Information Systems at UNC Wilmington
Fall 2014