You are here

How to Perform Sentiment Analysis on Web-Scraped Data

Sentiment analysis is a machine learning model that can extract subjective information from text, usually whether the overall mood is positive, neutral or negative. As a simple use case, this could help you decide if a film is worth going to the cinema for.

To do this, you need a data set. A good place to acquire the data would be to use a tool such as Kimono or ScraperWiki to retrieve every tweet that mentions that particular movie during the opening weekend. Once you have extracted the data, create a text classifier with MonkeyLearn and use sentiment analysis to extract the mood of the text. According to your text classifier, MonkeyLearn will return the overall mood of the tweets, with a negative result meaning you may as well stay in tonight.

In this tutorial by MonkeyLearn’s Raúl Garreta on the Kimono Blog, the two tools are combined to create a hotel sentiment meter. It will detect how guests feel about a particular hotel by measuring the sentiments expressed in the hotel’s reviews on TripAdvisor.

After installing the Kimono Chrome extension, followers are shown how to use it to select the data from the relevant Web page under three properties: Title, Content and Stars. Once the Kimono API is set up, followers can hit the “Start Crawl” button to begin scraping.

The author then uses Python and the Pandas library to preprocess the data in the KimonoData.csv file before handing it over to MonkeyLearn. Followers are again guided through setting up a custom text classifier with two categories: Good and Bad. Now that the module is set up, importing the necessary CSV file will create a corresponding category tree with three nodes: Root, Good and Bad. This will hold the reviews gathered from TripAdvisor.

With some training of the machine learning algorithm, its category prediction abilities will improve, and a keyword cloud on the project’s dashboard will indicate the types of words that are being used to determine the good/bad sentiment.

Be sure to read the next Machine Learning article: How Channel 9 Implemented The Azure Machine Learning API

Original Article

Sentiment analysis on web scraped data with kimono and MonkeyLearn




1. Kimono will only scrape at most every hour, and then the API you use gives you stale data.

2. Kimono operates as a service, so if a site doesn’t want to be scraped by them anymore, it just has to ban Kimono’s IP address. Then your Kimono API would stop working.