Data science and machine-learning company, ActiveWizards has a useful overview of text processing APIs, where you can begin researching this very useful tool as one possible asset management solution.
In the 90s, text processing products were used by airplane engine manufacturers, chemical engineering companies, and other very large organizations that need to manage millions of pages of content (often by law and for decades). Internet search engines grew out of these legacy products, but as flexible as Google and others are, these businesses and organizations often need more powerful products with more complex features. Many of these entities still have extensive legacy systems that require flexible and customizable solutions, and as you might expect, starting from scratch is an often an overwhelming, resource-intensive, and expensive prospect. Now, text processing APIs are coming to the rescue.
Text processing APIs not only free you from starting at the beginning, but they open up job positions to employees who know the basics, but who don't have extensive knowledge of natural language processing. In addition, ActiveWizards points out that understanding fully what these APIs can't do, and when to turn to an in-house built solution, is just as important.
To evaluate what your company requires, you need to understand what your employees (usually data analysts) need to do their jobs. Most often, they'll need such functionality as:
- Keyphrase extraction: the automatic collection of keywords and phrases, which also represents how accurately the API understands the document's contents
- Sentiment analysis: the detection of the text's general mood using specified classifications (e.g., positive, negative, neutral, or much more complex)
- Entity recognition: the process of splitting the text into entities: people, organizations, locations, dates, etc.
- Text analysis: the extent and type of capabilities the API has depends on the vendor. Functionality can include the creation of metadata, an analysis of how assets or entities within a document (or among many) are related, and more.
- Language detection: the analysis of which languages are used, which are dominant, etc.
- Translation: the translation of documents to preferred languages and other language manipulation
- Topic modeling: the creation of a matrix of topics covered in the documents
In addition, two capabilities extremely important to any text processing API are to (1) score its own work, meaning it must rate the confidence it has in any results, and (2) record its analysis speed.
ActiveWizards evaluates the following text processing APIs:
- Amazon Comprehend & Amazon Translate
- Google Cloud Natural Language & Google Cloud Translation
- IBM Watson Natural Language Understanding & IBM Watson Translator
- Microsoft Azure (linguistic analysis API) – beta, Microsoft Azure (text analytics API), and Microsoft Azure Translator Text API
ActiveWizards analysis of these APIs shows that while these APIs have similar features, the products have significant differences in capabilities, so carefully evaluate them before making a decision.