Webhose.io, a fairly new data as a service (DaaS) platform created by Buzzilla, has added named entity extraction capabilities to the Webhose.io API which allows users to filter posts by specific entities such as person, location or organization. This new feature also reduces ambiguity in text; words that have multiple meanings are analyzed based on context. For example, a search query for organization Apple, will return posts that mention Apple the company, not Apple the fruit.
Webhose.io uses proprietary crawling technology developed by the company over the past eight years to gather live data from hundreds of thousands of news sites, forums, blogs, and other online data sources. The platform then cleans and structures the content, organizing the data so that it is in a single, structured format that can be consumed using the Webhose.io API or Firehose (all the data that the Webhose.io platform crawls in real time).
Webhose.io is using Stanford CoreNLP for the platform’s named entity extraction capabilities. Stanford CoreNLP is a suite of open source natural language processing (NLP) tools written in Java that can perform a variety of NLP functions such as named entity extraction, indicate sentiment, normalize dates and times, etc.
At the time of publication, the option to filter posts by specific entities is only available for news articles in English. However, Ran Geva, CTO and co-founder of Buzzilla and Webhose.io, told ProgrammableWeb that the company hopes to extend these filtering capabilities to more languages and verticals (forums, blogs, etc.) in the near future. Ran Geva also told ProgrammableWeb that the company has received a lot of feedback about the platform since it was released at the end of 2014 and that the company realized:
"the demand for online data is far more granular than data from forums, blogs and news. Of course there is the obvious demand for social/online data for brand monitoring companies, reputation management, security, BI, and finance platforms. But the demand for a much higher resolution is very high. For example, providing data about recipes, retail pricing, retail inventory numbers, a company's hiring requirements, and much more."
He went on to say that:
"We strive to be the translation layer between the 'visual' web and all the different types of content it presents, to software platforms that need to 'understand' the data structure, before they can do anything with it. That's what we excel in, and we offer an API that lets companies concentrate on their unique value-add rather than reinventing the wheel."
For more information about the Webhose.io platform, visit the official website.