V13 N2 Paper 2
Annals of the MS in Computer Science and Information Systems at UNC Wilmington
Fall 2019

Daily Text Analytics of News and Social Media with Power BI  

Jannatun Nahar

Committee

Douglas Kline (chair)
Lucas Layman
Minoo Modaresnezhad

Abstract

In the era of modern Data science and Big Data, it is no longer a wonder to enable machine learning to understand human language and know what people are feeling and thinking with their surroundings. The term is called Sentiment Analysis or Opinion Mining which combines the power of natural language processing, text analysis and computational linguistics to classify subjective information or the emotional state of the writer/subject/topic. Instead of just identifying a positive/negative/neutral sentiment, we can also extract keywords that intensifies different emotions such as joy, excitement, frustration, fear etc. from the content. In this project, Google Daily Search Trends are used to choose the topic on which various text analytics application will be applied. Google now processes over 40,000 search queries every second on average and over 3.5 billion searches per day [1]. To see what the world is looking for, there is a Trending Searches page addition to Google Trends that publish the most frequently searched terms along with their search volume and related news stories of the past 24-hour across various countries. The URL of daily search trend list is available at https://trends.google.com/trends/trendingsearches/daily?geo=US. For this project, the first trending search topic is selected and analyzed to create a text analytics visualization report. To get the most recent data about the search topic, various news articles related to the search topic and Twitter data source is used. All these data sources are publicly available content. This project represents a system that assigns sentiment scores and extracts key emotion associated with the opinion expressed in these news stories and Twitter posts on a certain trending search topic. Although full comprehension of natural language text remains well beyond the power of machines, the implemented statistical analysis of moderately simple sentiment indications can provide a meaningful quantitative summary of these large amount of qualitative information. The project is primarily implemented on Microsoft Power BI, Python and R programming platform. Power BI is a data visualization tool that supports a large range of data sources (virtually any data source) to load, transform and clean the data into a data model. A great feature of Power BI is we can connect to a web page and import its data into a dataset. In Power BI, the dataset is usually referred to as a Query or Table. In Power BI, first the raw text data about the most searched topic is captured from various news articles and Twitter feeds. The text data is used to apply various text analytics application by identifying the text polarity, tallying positive and negative words used in the text, extracting emotion category of the words etc. After conducting necessary analysis, a text analytics summary dashboard is created in Power BI Desktop. To generate the report with latest data, a scheduled refresh is configured on the dataset. Finally, the report is shared with power BI users associated with a UNCW or Office 365 email account.

download (pdf)

Recommended Citation: Nahar, J., Kline, D, Layman, L., Modaresnezhad, M. (2019) Daily Text Analytics of News and Social Media with Power BI. Annals of the Master of Science in Computer Science and Information Systems at UNC Wilmington, 13(2) paper 2. http://csbapp.uncw.edu/data/mscsis/full.aspx.

V13 N2 Paper 2
Annals of the MS in Computer Science and Information Systems at UNC Wilmington
Fall 2019