If you ever need to analyze social media sites for your business analytics, you will notice two things: it takes some serious technical prowess to code everything yourself, and it takes some seriously deep pockets to get a consultant or a software company to do it for you. Fortunately, there are APIs that you can use to help with both of these problems.
I use ParseHub, a desktop Web scraping application, to collect text from sites like twitter.com, and process the data using textual analysis APIs. In a recent research project, I used the sentiment analysis API from text-processing.com to analyze the tone of 27,000 tweets that were scraped with ParseHub. This information was used to identify two trends that could have predicted the results of the Presidential Primaries: Hillary Clinton and Donald Trump were both mentioned more than their opponents by news networks, and they sent tweets that were more emotional.
This tutorial will show you the steps needed to mine the the sentiment of tweets, by integrating them into a Python project with ParseHub’s API options and sending them to the text-processing API for sentiment analysis.
I used Jupyter/iPython Notebook to analyze my data. You can find a ParseHub tutorial on data analysis using iPython notebooks here, or find a ProgrammableWeb example of using ParseHub with Python here. Feel free to follow along with whatever your preferred data analysis language is, and let us know about your experience in the comments or in an email!
Collecting Tweets From Snapbird.org With ParseHub
In order to instruct ParseHub to extract the desired tweets, you will need to create a ParseHub project in the desktop application. Once the project is created, the data it extracts can be integrated into your Python script using one of ParseHub’s API options. You can run a project and download its data either through the desktop app, sent in a scheduled email, or controlled remotely using HTTP requests.
I collect tweets from the website snapbird.org, because it saves nearly 3,000 of the latest tweets from any twitter account, whereas twitter.com only lets you see the last 800 or so. When the ParseHub project is run, it extracts all 3,000 tweets from each twitter account that is entered as a list. Here is how you can create a similar project:
Open Snapbird.org and Navigate to the Log In Page
- Open snapbird.org in the ParseHub desktop app and click "Create New Project". Your project will be created, and you will start on the main_template.
- Select the button that says "LOG IN USING TWITTER" in the AJAX pop up window. Rename the selection "login_button".
- Click on the "plus" button next to the "Select page" command to add a new command.
- Choose the Click tool from the pop-up menu. This menu has a variety of tools you will use to handle interactive elements and get data.
- In the pop-up window select "Go to a new template" and create a new template named "signin_page". ParseHub will now open the new page for you.
Use ParseHub to log in with email and password
- You are now in the new template called signin_page. Click on the "Username or email" text input box to select it, and enter your Twitter username into the text box.
- Click on the "plus" button on the "Select page" command, and choose the select tool from the menu.
- Click on the "Password" text input box and enter your Twitter password.
- Click on the "plus" button on the "Select page" command and choose the select tool again from the menu.
- Select the button that says "Authorize app".
- In the pop-up window select "go to new template" and rename the template "search_page". I told ParseHub that it Uses AJAX, not that it Loads a new page, since there are some funny redirects that happen before you reach the page you want.
Searching for multiple Twitter accounts
- You are now back on the snapbird.com home page, in the template search_page. Click the "plus" button and add the Loop command.
- Under "for each", enter the variable name, and under "in" enter an array of Twitter accounts. This commands ParseHub to make a loop though all of the different Twitter accounts that you would like to scrape.
- Select the text box that has your user name in it by default with the Select tool.
- The variable name was given to the name of each twitter account in your list, so change the input type from "text" to "expression" and enter name, with no quotation marks.
- Select the search button with the text "Find It!" and tell ParseHub to click it. Since this will not take you to a new page or open an AJAX window, I told ParseHub to "Continue" with the current template. This causes the first 100 tweets to appear on the page.