SARE: A SENTIMENT ANALYSIS RESEARCH ENVIRONMENT
Mus’ab Habib Husaini
Computer Science and Engineering, MSc Thesis, 2013
Assoc. Prof. Dr. Yücel Saygın (Thesis Supervisor), Assoc. Prof. Dr. Berrin Yanıkoğlu (Thesis Co-Supervisor), Asst. Prof. Dr. Hakan Erdoğan, Asst. Prof. Dr. Hüsnü Yenigün, Asst. Prof. Dr. Cemal Yılmaz
Date & Time: July 18th, 2013 – 13:00
Place: FENS L062
Keywords: Sentiment analysis, opinion mining, aspect lexicon extraction, set cover approximation, integrated research environment
Sentiment analysis is an important learning problem with a broad scope of applications. The meteoric rise of online social media and the increasing significance of public opinion expressed therein have opened doors to many challenges as well as opportunities for this research. The challenges have been articulated in the literature through a growing list of sentiment analysis problems, while the opportunities are constantly being availed with the introduction of new algorithms and techniques for solving them. However, these approaches remain isolated and often out of the direct reach of other researchers, who have to either rely on benchmark datasets, which are not always available, or be inventive with their comparisons.
This thesis presents Sentiment Analysis Research Environment (SARE), an extendable and publicly-accessible system designed with the goal of integrating baseline and state-of-the-art approaches to solving sentiment analysis problems. Since covering the entire breadth of the field is beyond the scope of this work, the usefulness of this environment is demonstrated by integrating solutions for certain facets of the aspect-based sentiment analysis problem. Currently, the system provides a semi-automatic method to support building gold-standard lexica, an automatic baseline method for extracting aspect expressions, and a pre-existing baseline sentiment analysis engine. Users are assisted in creating gold-standard lexica by applying our proposed set cover approximation algorithm, which finds a significantly reduced set of documents needed to create a lexicon. We also suggest a baseline semi-supervised aspect expression extraction algorithm based on a Support Vector Machine (SVM) classifier to automatically extract aspect expressions.