How to Perform, Visualize Sentiment Analysis with the WebKnox Text Processing API

Mark Boyd
Jul. 21 2014, 06:47PM EDT

Sentiment analysis is a type of machine learning in which developers create a script that can read a block of text and analyze whether it is positive, neutral or negative. It is a process widely used in predictive analytics and business intelligence to assess reaction to a brand’s products, services or key industry issues. This tutorial looks at how to use the WebKnox Text Processing API to conduct a sentiment analysis, and to create a visualization that summarizes audience reactions.

text-processing

WebKnox and the Growth of Machine Learning

Along with the growth in data has come the need to use APIs to help understand what that data means. Machine learning and predictive analytics have seen a growing level of interest this year as developers seek to use API tools to analyze data on-the-fly and create meaningful interpretations directly in their web apps and visualizations.

WebKnox offers a range of APIs built on machine learning algorithms developed originally as part of a PhD thesis at the Australian premier design university RMIT in Melbourne and completed at the German “ivy league” school University of Technology in Dresden. As part of a push for entrepreneurial growth and market innovation, the APIs — led by a research and design team under Dr. David Urbansky — are now being made available commercially. Already, the startup’s Recipe Search API has been voted as one of the world’s best based on the machine learning techniques that power its text processing and search functionality.

The WebKnox Text Processing API can scan Web pages and blocks of text, and analyze them in a variety of ways. One of the most common tasks to perform via text processing is sentiment analysis. This type of analysis reviews text phrases and assigns a value to each one--for example, whether a phrase is positive, neutral or negative. In this way, developers can automate the process of reading text--such as Twitter mentions, blog comments or forum conversations--about a product, service or issue, and make some general assumptions about the audience reaction.

In this tutorial we will examine how to use the WebKnox Text Processing API to conduct a sentiment analysis, using PHP.

For our tutorial example, we are going to look at developer feedback on the Google Maps API from the ProgrammableWeb API directory and create two visualizations:

  • A color-coded phrase-by-phrase breakdown of comment tone (positive, neutral or negative)
  • A summary pie graph that shows developer reactions to the API
sentiment

ABOVE: The sentiment analysis and visualization that we will create in this tutorial using the WebKnox Text Processing API

Getting Started: Registration and Library Downloads

1. Register for an API key

You must have a Mashape account to use this API. If you don't have one already, go to the Mashape home page and create an account or login. Complete the login form with either your email details or by linking to your Github account, and then activate your account.

While you can use the Mashape-provided API key in this code example, you will also need to register for the Freemium plan with WebKnox so that the key is connected to the Text Processing API usage.

Once logged in to your Mashape account, go to the Text Processing API page. Click on the Pricing tab and subscribe to the Basic (Freemium) edition.

2.Download the unirest helper methods into your PHP library.

You can use the PHP library (unirest.io) provided by Mashape to work with this API. Download unirest.io into your PHP library.

Now create the following script:

<?php

use Unirest\HttpMethod;
use Unirest\HttpResponse;

class Unirest
{
	
	private static $verifyPeer = true;
	private static $socketTimeout = null;
	private static $defaultHeaders = array();
	
	/**
	 * Verify SSL peer
	 * @param bool $enabled enable SSL verification, by default is true
	 */
	public static function verifyPeer($enabled)
	{
		Unirest::$verifyPeer = $enabled;
	}
	
	/**
	 * Set a timeout
	 * @param integer $seconds timeout value in seconds
	 */
	public static function timeout($seconds)
	{
		Unirest::$socketTimeout = $seconds;
	}
	
	/**
	 * Set a new default header to send on every request
	 * @param string $name header name
	 * @param string $value header value
	 */
	public static function defaultHeader($name, $value)
	{
		Unirest::$defaultHeaders[$name] = $value;
	}
	
	/**
	 * Clear all the default headers
	 */
	public static function clearDefaultHeaders()
	{
		Unirest::$defaultHeaders = array();
	}
	
	/**
	 * Send a GET request to a URL
	 * @param string $url URL to send the GET request to
	 * @param array $headers additional headers to send
	 * @param mixed $parameters parameters to send in the querystring
	 * @param string $username Basic Authentication username
	 * @param string $password Basic Authentication password
	 * @return string|stdObj response string or stdObj if response is json-decodable
	 */
	public static function get($url, $headers = array(), $parameters = NULL, $username = NULL, $password = NULL)
	{
		return Unirest::request(HttpMethod::GET, $url, $parameters, $headers, $username, $password);
	}
	
	/**
	 * Send POST request to a URL
	 * @param string $url URL to send the POST request to
	 * @param array $headers additional headers to send
	 * @param mixed $body POST body data
	 * @param string $username Basic Authentication username
	 * @param string $password Basic Authentication password
	 * @return string|stdObj response string or stdObj if response is json-decodable
	 */
	public static function post($url, $headers = array(), $body = NULL, $username = NULL, $password = NULL)
	{
		return Unirest::request(HttpMethod::POST, $url, $body, $headers, $username, $password);
	}
	
	/**
	 * Send DELETE request to a URL
	 * @param string $url URL to send the DELETE request to
	 * @param array $headers additional headers to send
	 * @param mixed $body DELETE body data
	 * @param string $username Basic Authentication username
	 * @param string $password Basic Authentication password
	 * @return string|stdObj response string or stdObj if response is json-decodable
	 */
	public static function delete($url, $headers = array(), $body = NULL, $username = NULL, $password = NULL)
	{
		return Unirest::request(HttpMethod::DELETE, $url, $body, $headers, $username, $password);
	}
	
	/**
	 * Send PUT request to a URL
	 * @param string $url URL to send the PUT request to
	 * @param array $headers additional headers to send
	 * @param mixed $body PUT body data
	 * @param string $username Basic Authentication username
	 * @param string $password Basic Authentication password
	 * @return string|stdObj response string or stdObj if response is json-decodable
	 */
	public static function put($url, $headers = array(), $body = NULL, $username = NULL, $password = NULL)
	{
		return Unirest::request(HttpMethod::PUT, $url, $body, $headers, $username, $password);
	}
	
	/**
	 * Send PATCH request to a URL
	 * @param string $url URL to send the PATCH request to
	 * @param array $headers additional headers to send
	 * @param mixed $body PATCH body data
	 * @param string $username Basic Authentication username
	 * @param string $password Basic Authentication password
	 * @return string|stdObj response string or stdObj if response is json-decodable
	 */
	public static function patch($url, $headers = array(), $body = NULL, $username = NULL, $password = NULL)
	{
		return Unirest::request(HttpMethod::PATCH, $url, $body, $headers, $username, $password);
	}
	
	/**
	 * Prepares a file for upload. To be used inside the parameters declaration for a request.
	 * @param string $path The file path
	 */
	public static function file($path)
	{
		if (function_exists("curl_file_create")) {
			return curl_file_create($path);
		} else {
			return "@" . $path;
		}
	}
	
	/**
	 * This function is useful for serializing multidimensional arrays, and avoid getting
	 * the "Array to string conversion" notice
	 */
	public static function http_build_query_for_curl($arrays, &$new = array(), $prefix = null)
	{
		if (is_object($arrays)) {
			$arrays = get_object_vars($arrays);
		}
		
		foreach ($arrays AS $key => $value) {
			$k = isset($prefix) ? $prefix . '[' . $key . ']' : $key;
			if (!$value instanceof \CURLFile AND (is_array($value) OR is_object($value))) {
				Unirest::http_build_query_for_curl($value, $new, $k);
			} else {
				$new[$k] = $value;
			}
		}
	}
	
	/**
	 * Send a cURL request
	 * @param string $httpMethod HTTP method to use (based off \Unirest\HttpMethod constants)
	 * @param string $url URL to send the request to
	 * @param mixed $body request body
	 * @param array $headers additional headers to send
	 * @param string $username  Basic Authentication username
	 * @param string $password  Basic Authentication password
	 * @throws Exception if a cURL error occurs
	 * @return HttpResponse
	 */
	private static function request($httpMethod, $url, $body = NULL, $headers = array(), $username = NULL, $password = NULL)
	{
		if ($headers == NULL)
			$headers = array();

		$lowercaseHeaders = array();
		$finalHeaders = array_merge($headers, Unirest::$defaultHeaders);
		foreach ($finalHeaders as $key => $val) {
			$lowercaseHeaders[] = Unirest::getHeader($key, $val);
		}
		
		$lowerCaseFinalHeaders = array_change_key_case($finalHeaders);
		if (!array_key_exists("user-agent", $lowerCaseFinalHeaders)) {
			$lowercaseHeaders[] = "user-agent: unirest-php/1.1";
		}
		if (!array_key_exists("expect", $lowerCaseFinalHeaders)) {
			$lowercaseHeaders[] = "expect:";
		}
		
		$ch = curl_init();
		if ($httpMethod != HttpMethod::GET) {
			curl_setopt($ch, CURLOPT_CUSTOMREQUEST, $httpMethod);
			if (is_array($body) || $body instanceof Traversable) {
				Unirest::http_build_query_for_curl($body, $postBody);
				curl_setopt($ch, CURLOPT_POSTFIELDS, $postBody);
			} else {
				curl_setopt($ch, CURLOPT_POSTFIELDS, $body);
			}
		} else if (is_array($body)) {
			if (strpos($url, '?') !== false) {
				$url .= "&";
			} else {
				$url .= "?";
			}
			Unirest::http_build_query_for_curl($body, $postBody);
			$url .= urldecode(http_build_query($postBody));
		}
		
		curl_setopt($ch, CURLOPT_URL, Unirest::encodeUrl($url));
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
		curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
		curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
		curl_setopt($ch, CURLOPT_HTTPHEADER, $lowercaseHeaders);
		curl_setopt($ch, CURLOPT_HEADER, true);
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, Unirest::$verifyPeer);
		curl_setopt($ch, CURLOPT_ENCODING, ""); // If an empty string, "", is set, a header containing all supported encoding types is sent.
		if (Unirest::$socketTimeout != null) {
			curl_setopt($ch, CURLOPT_TIMEOUT, Unirest::$socketTimeout);
		}
		if (!empty($username)) {
			curl_setopt($ch, CURLOPT_USERPWD, $username . ":" . ((empty($password)) ? "" : $password));
		}
		
		$response = curl_exec($ch);
		$error    = curl_error($ch);
		if ($error) {
			throw new Exception($error);
		}
		
		// Split the full response in its headers and body
		$curl_info   = curl_getinfo($ch);
		$header_size = $curl_info["header_size"];
		$header      = substr($response, 0, $header_size);
		$body        = substr($response, $header_size);
		$httpCode    = $curl_info["http_code"];
		
		return new HttpResponse($httpCode, $body, $header);
	}
	
	private static function getArrayFromQuerystring($querystring)
	{
		$pairs = explode("&", $querystring);
		$vars  = array();
		foreach ($pairs as $pair) {
			$nv          = explode("=", $pair, 2);
			$name        = $nv[0];
			$value       = $nv[1];
			$vars[$name] = $value;
		}
		return $vars;
	}
	
	/**
	 * Ensure that a URL is encoded and safe to use with cURL
	 * @param  string $url URL to encode
	 * @return string
	 */
	private static function encodeUrl($url)
	{
		$url_parsed = parse_url($url);
		
		$scheme = $url_parsed['scheme'] . '://';
		$host   = $url_parsed['host'];
		$port   = (isset($url_parsed['port']) ? $url_parsed['port'] : null);
		$path   = (isset($url_parsed['path']) ? $url_parsed['path'] : null);
		$query  = (isset($url_parsed['query']) ? $url_parsed['query'] : null);
		
		if ($query != null) {
			$query = '?' . http_build_query(Unirest::getArrayFromQuerystring($url_parsed['query']));
		}
		
		if ($port && $port[0] != ":")
			$port = ":" . $port;
		
		$result = $scheme . $host . $port . $path . $query;
		return $result;
	}
	
	private static function getHeader($key, $val)
	{
		$key = trim(strtolower($key));
		return $key . ": " . $val;
	}
	
}

if (!function_exists('http_chunked_decode')) {
	/**
	 * Dechunk an http 'transfer-encoding: chunked' message 
	 * @param string $chunk the encoded message 
	 * @return string the decoded message
	 */
	function http_chunked_decode($chunk)
	{
		$pos     = 0;
		$len     = strlen($chunk);
		$dechunk = null;
		
		while (($pos < $len) && ($chunkLenHex = substr($chunk, $pos, ($newlineAt = strpos($chunk, "\n", $pos + 1)) - $pos))) {
			
			if (!is_hex($chunkLenHex)) {
				trigger_error('Value is not properly chunk encoded', E_USER_WARNING);
				return $chunk;
			}
			
			$pos      = $newlineAt + 1;
			$chunkLen = hexdec(rtrim($chunkLenHex, "\r\n"));
			$dechunk .= substr($chunk, $pos, $chunkLen);
			$pos = strpos($chunk, "\n", $pos + $chunkLen) + 1;
		}
		
		return $dechunk;
	}
}

/**
 * determine if a string can represent a number in hexadecimal 
 * @link http://uk1.php.net/ctype_xdigit
 * @param string $hex 
 * @return boolean true if the string is a hex, otherwise false 
 */
function is_hex($hex)
{
	return ctype_xdigit($hex);
}
?>

Collate Your Materials and Code: From Comments to PHP Response

3. Collect the comments you want to analyze

Next we need to collect all of the comments that we want to analyze for sentiment.

In this instance, we are using the chrome extension XPath Helper. After installing the extension, we opened the ProgrammableWeb page for the Google Maps API and selected the Comments tab. Now use Control+Shift+X to start XPath Helper. We then created the XPath query in the left-hand box:

//div@class='field-items' /div@class='field-item even'

The full list of responses are now listed in the right-hand box.

Copy these comments to a notepad.

4. Create the PHP HTTP Response script.

Now we can create the PHP response script that will analyze these comments. Use the following PHP script, making the following adjustments:

  • In the second line, you will need to input your Mashape API key.
  • From the sixth line, you need to add all of the comments you have scraped and pasted into a notepad. (We have done this for you, but you can see in the below array where this goes.)
<?php

// include the Unirest helper methods from mashape: https://github.com/Mashape/unirest-php

requireonce 'lib/Unirest.php';

// get your authentication from mashape after signing in

$mashapeKey = '<PUT-YOUR-KEY-IN-HERE>';

// we only highlight sentiments above this confidence threshold

$confidenceThreshold = 0.05;

// we count the positive, negative, and neutral comments for a nice doughnut chart

$numPositive = 0;

$numNegative = 0;

$numNeutral = 0;

// an array of comments that we want to tag with sentiments

$comments = array();

array_push($comments, 'I think this is a great service. Every web site or service, especially those "location based", "get the cheapest prices" services should use this.');


array_push($comments, 'Just thought I would suggest an add to the mashups. www.navitraveler.com It uses mashups in a powerful way by allowing users to categorize and even modify additions "wiki style". You may want to check it out.');


array_push($comments, 'This is a wonderful service');


array_push($comments, 'The best map api around. Very fast and reliable. I love the hybrid map mode. Unline other apis, Google Maps Api GOverviewMap control is ver useful. It can also used outer side of the map. Thanks Google Maps Team.');


array_push($comments, 'This is a cool service! GMs Rocks! I built my site using this - www.eatables.in !!! Search Feature is too gud! Check my site for more!!!! Here is another eBay / Google Maps mashup that you might like to consider adding: http://www.auctionsearchkit.co.uk Cheers, Steve');


array_push($comments, 'My website use this, and where in Brazil is very popular. Its a real state website with all the offers on the map. Its Imobilien. Could you mind take a look and add on the list. Thx and sorry about my english');


array_push($comments, 'When will google fix maps for other countries.  I\'m currently studying in the Philippines and the map for Manila is way off. It doesn\'t even correspond to the satellite view!');


array_push($comments, 'Excellent service provided by GoogleMaps where ever you are you may locate the place you want to visit with few mouse click.');


array_push($comments, 'Some of rural addresses are not covered by Google Maps. In that case, MapQuest and Yahoo Maps are more thorough.  Yahoo Maps gives even coordinates of the location.');


array_push($comments, 'I use Google all the time. However could you change your information in La Quinta CA at 50th and Jefferson. On the East side of Jefferson is Mountain View Country Club. On the West side of Jefferson is the Cirtus Country Club. Your now have the Cirtus Country Club on the East side. Hopefully you can change it. Respectfully yours, Rick');


array_push($comments, 'I use Google apps as much as possible, and Maps really gets it better than other services when you need to place a building or get to an obscure side street or massive administrative center.');


array_push($comments, 'I find my house, in a two clicks. The best map i even seen.');


array_push($comments, 'Aeropark is also very happy with the mapping service in cebu.');


array_push($comments, 'I used Google map directions twice this week and both times the directions were WRONG! Reporting left turns when it should be right turns with incorrect mileage. Lost for 30 minutes - NOT HAPPY ABOUT THAT!');


array_push($comments, 'i found here a great information http://www.telebrand.com.pk');


array_push($comments, 'Google Map Pro - Does anyone know how the timeline works so that I can overlay data that is due in the future?');


array_push($comments, 'I have been using Google maps to find driving directions for some time. I tried to use it today and found it &quot;improved&quot;! So much &quot;improved&quot; that I will never use it again. What a complete waste.');


array_push($comments, 'This is a cool service! GMs Rocks! I built my site using this - www.eatables.in !!!');


array_push($comments, 'Search Feature is too gud! Check my site for more!!!!');


array_push($comments, 'Here is another eBay / Google Maps mashup that you might like to consider adding: www.auctionsearchkit.co.uk');


array_push($comments, 'My website use this, and where in Brazil is very popular. Its a real state website with all the offers on the map. Its &lt;a href=&quot;http://www.imobilien.com.br/&quot;&gt;Imobilien&lt;/a&gt;. Could you mind take a look and add on the list.Thx and sorry about my english');


array_push($comments, 'When will google fix maps for other countries. I&#039;m currently studying in the Philippines and the map for Manila is way off. It doesn&#039;t even correspond to the satellite view!');

// disables SSL cert validation

UnirestverifyPeer(false);

// let's build a string with some markup to highlight positive and negative things

$htmlString = '';

foreach ($comments as $comment)

$splitResponse = Unirestget(

'https://webknox-text-processing.p.mashape.com/text/sentences?text='.urlencode(striptags($comment)).'.&language=en',

array(

"X-Mashape-Authorization" => $mashapeKey

),

null

);

foreach ($splitResponse->body as $sentence)

$sentimentResponse = Unirestget(

'https://webknox-text-processing.p.mashape.com/text/sentiment?text='.$sentence.'&language=en',

array(

"X-Mashape-Authorization" => $mashapeKey

),

null

);

$style = '';

if ($sentimentResponse->body->document->sentiment == 'positive' && $sentimentResponse->body->document->confidence > $confidenceThreshold)

$style = 'background-color: rgba(0,255,0,'.$sentimentResponse->body->document->confidence.')';

$numPositive

} else if ($sentimentResponse->body->document->sentiment == 'negative' && $sentimentResponse->body->document->confidence > $confidenceThreshold)

$style = 'background-color: rgba(255,0,0,'.$sentimentResponse->body->document->confidence.')';

$numNegative

} else

$numNeutral

}

$htmlString .= '<span style="'.$style.'">';

$htmlString .= urldecode($sentence);

$htmlString .= '</span> ';

}

$htmlString .= '<br><br>';

}

echo $htmlString;

?>

5. Visualize the results

Now we want to prepare scripts so that after the sentiment analysis is conducted using the WebKnox Text Processing API, we can automatically show the results in visual form. For this exercise, we will use the canvas.js charting library to render the charts.

Download the canvas.js chart library to your PHP library.

The following code is the rendering part responsible for showing the output the PHP script created.

We will be generating an HTML document that draws on the canvas.js library to visualize the results.

Add the following to the above PHP script:

<!DOCTYPE HTML>

<html>

<head>

<script type="text/javascript">

window.onload = function ()

CanvasJS.addColorSet("goodBad",

//colorSet Array

"#09b64b",

"#c91e1e",

"#888888"

]);

var chart = new CanvasJS.Chart("chartContainer",


colorSet: "goodBad",

title:

text: "How do users feel about this API?",

verticalAlign: 'top',

horizontalAlign: 'center'

},

data:


type: "doughnut",

startAngle:20,

dataPoints:

y: <?php echo $numPositive;?>, label: "Positive" ,

y: <?php echo $numNegative;?>, label: "Negative" ,

y: <?php echo $numNeutral;?>, label: "Neutral"

]

}

]

});

chart.render();

}

</script>

<script type="text/javascript" src="canvasjs.min.js"></script>

</head>

<body>

<div id="chartContainer" style="height: 300px; width: 100%;">

</div>

</body>

</html>

Here’s how the HTML page looks when rendered in full:

Putting it all together

If you are using the WebKnox Text Processing API to help clients analyze their comments, you might just want to show them the results of your analysis.

So, here’s a quick recap of what you would do:

  1. Register for the Mashape API key and register for Freemium access to the WebKnox Text Processing API.
  2. Download unirest and canvas.js libraries into your PHP library.
  3. Scrape the comments you want to analyze for sentiment.
  4. Adjust the PHP response script to conduct the analysis, adding the API key and scraped comments directly into the script.
  5. Add the HTML page code so that results can be rendered in canvas.js and displayed as both the collected comments, color-coded to show how they have been analyzed, and the chart showing overall sentiment.

In many cases, you may want to just analyze the comments and not visualize the results, but instead use them for the next part of your process. For simplicity, the PHP and HTML code is shown here. You may decide you do not need a front end for showing the results, but instead want to weave the analytics into your application by storing the data in your back end or adding it to a database. By following this tutorial, you can see how to get this API running rather quickly and can then decide how you want to integrate its different parts into your application.

Using this tutorial in your own projects

Machine learning algorithms such as sentiment analysis are used for a variety of projects.

For API providers, this sentiment analysis can be used at various stages to manage developer engagement with your API (or for reviewing the sentiment of your competitors). Here are a few ideas:

Media campaigns: When managing hackathons or API product launches, you can analyze comments on Twitter or social media to uncover what messaging is most effective and to see what aspects of your API are of most interest to the early adopter developer community and those considering using your API.

Brand management and developer loyalty: Conducting regular sentiment analysis on your own developer forums, and in public forums where your API is discussed (like on ProgrammableWeb, as shown above, or on sites like StackOverflow), can help you better understand and reduce your churn rate (that is, the rate at which your current developer-consumers stop using your API). Sentiment analysis can help you identify the points at which your developers may be giving up on your API and looking for an alternative. Positive comments may be useful as testimonials (where you reach out to those making the positive comments and request to re-use quotes in promotional materials).

New product and service design: Sentiment analysis may reveal particular features or ideas that developers want from your API. In some cases, this could be represented as dissatisfaction, as there could be an assumption that you are already providing this feature as part of your API. Perhaps you receive negative comments because API developer-consumers are expecting a particular API feature that you do not offer. So, initially, you could improve your messaging to reduce this over-expectation, but you can also add this idea of a feature as a potential new product or service to develop in future.

Sentiment analysis may also reveal some use cases that can be adopted to a wider audience. For example, in the above example, auctionsearchkit and eatables use the Google Maps API to create particular mashups. Sentiment analysis may reveal some interesting use cases that, as an API provider, you can build out to be a specific niche service or product.

Assess quality of your support services: You can use sentiment analysis to analyze the comments received following service requests, or you can analyze feedback by scraping any comments received on any support desk forums you operate (or, again, on particular parts of your developer forum site, such as when a comment is made after a staff member replies to a particular thread). This can help you understand how well your service response is being perceived.

Sentiment analysis is an important community engagement metric, and with APIs like the WebKnox Text Processing API, it is an analytical technique now open to any business.

Thanks to Dr. David Urbansky for help with this tutorial.

Mark Boyd is a ProgrammableWeb writer covering breaking news, API business strategies and models, open data, and smart cities. I can be contacted via email, on Twitter, or on Google+.

Comments