Yuri Kitin recently published a post on LinkedIn Pulse in which he compares the performance of 10 natural language processing APIs. These APIs perform entity extraction to locate and classify named entities in text into predefined categories.
The test measured performance on three collections of text made up of various articles from the Web, with each collection containing 50 proper nouns (person classification), 50 geographical names (location), and 50 organisation/company names (organisation). The APIs were assessed for how many of each classification they extracted and identified, the total number of extractions, and the number of errors (where an entity was incorrectly classified, or the whole of the named entity was not identified for extraction).
Below are the results in ascending order by number of errors:
1.Intellexer API: A recently-launched cloud semantic server, recognises 12 entity types, including Person, Location, and Organisation. It correctly extracted and identified 139 entities out of a possible 150 with only 4 errors.
2.Semantria Lexalytics: The Boston-based company launched its cloud text-analysing API in 2012. While this entry raised only 5 errors, it correctly extracted and identified only 71 entities out of 150.
3.Alchemy IBM: The AlchemyAPI text analysis service has been available since 2009. It recognises and extracts over 40 primary types of entities, and picked up 103 out of 150 here with 9 errors.
4.Indico: Launching out of TechStars Boston in 2013. Offers three types of entity (People, Places, Organisations), and scored 128 with 11 errors.
5.Google Natural Language: Beta version released in July 2016. The list of extracted entities can be sorted by salient scores representing their relevance to the overall text. Scored 131 with 12 errors.
6.Aylien: Dublin-based company launched in 2011. It extracts entities as usual, with the addition of values, such as URLs, telephone numbers, or currency amounts. The downside is it only recognised 68 entities with 14 errors.
7.Meaning Cloud: Launched in 2015 by Spanish company Daedalus as a continuation of their previous product Textalytics. Offers subtypes of categories (such as Sea, Lake, or Continent under the Location category). Correctly classified 79 entities with 21 errors.
8.Text Razor: London-based company launched in 2011. Each identified entity is provided with a Wikipedia link. Text Razor did extract some entities that were not actually present in the text, but these errors were missing when processing the same text again. Scored 129 with 29 errors.
9.Haven on Demand (HPE): Developed by Hewlett Packard Enterprise in 2014. It classifies entities by numerous groups (such as people, person name component, person full-name, etc), but raised many errors relating to partial name extraction. Scored 94 correct classifications with 29 errors.
10.Microsoft Cognitive Services: Launched in March 2016 and based on Project Oxford and Bing. Unfortunately, it doesn’t classify the extracted named entities so offered an incomplete comparison. Suffered from slightly different results when processing the same text multiple times. Correctly extracted and recognised 127 entities without classification.