Lade Inhalt...

Textual Classification for Sentiment Detection. Brand Reputation Analysis on the Web using Natural Language Processing and Machine Learning

Akademische Arbeit 2018 54 Seiten

Informatik - Angewandte Informatik


Cloud computing makes it possible to build scalable machine learning systems for processing massive amounts of complex data, be them structured or unstructured, real-time or historical, the so-called Big Data. Publicly available cloud computing platforms have been made available, for instance, Amazon EC2, EMR, and Google Compute Engine. More importantly, open source APIs and libraries have also been developed for ease of programming on the cloud, for instance, Cascading, Storm, Scalding, Apache Spark and Trackur. Meanwhile, computational intelligence approaches, examples of which include evolutionary computation, immune-inspired approaches, and swarm intelligence, are also employed to develop scalable machine learning and data analytics tools.

In this project, we presented the sentiment-focused web crawling problem and designed a sentiment-focused web crawler frame-work for faster discovery and retrieval of sentimental context on the Web. We have developed a computational framework to perform automated reputation analysis on the Web using Natural Language Processing and Machine Learning. This paper introduces such framework and tests its performance on automated sentiment analysis for brand reputation. In addition, we proposed different strategies for predicting the polarity scores of web pages.

Experiments have shown that the performance of our proposed framework is more efficient than existing frameworks. Reputation analysis is a useful application for organizations that are looking for people's opinions about their products and services.

Our approach consists of 4 parts: in the first part, the framework performed Web crawling based on the query specified by the user. In the second part, the framework locates relevant information within textual data using Entity Recognition. In the third part, relevant information was recorded in the database for feature extraction/engineering and classification. Lastly, the framework displayed the data for reputation analysis. In the training phase, we used data provided by the marketing team of the University of the Witwatersrand, Emoticons, a subset of the SentiStrength lexicon and ClueWeb09 dataset. Each domain was labelled accordingly (positive/negative and neutral) with equal numbers of polarity in plain text. In the test phase, the classifier predicted the polarity of real-time data. We used accuracy as evaluation metric to measure how much our classifier acted precisely.


ISBN (Buch)
2.4 MB
Institution / Hochschule
University of the Witwatersrand
textual classification sentiment detection brand reputation analysis natural language processing machine learning



Titel: Textual Classification for Sentiment Detection. Brand Reputation Analysis on the Web using Natural Language Processing and Machine Learning