Yahoo Term Extraction is Reborn as Content Analysis

The new Yahoo Content Analysis API capitalizes on the success of its term extraction API while adding new functionality. The reborn implementation not only extracts terms and entities but ranks them based on perceived importance within the article. Peter Levinson, product manager for Yahoo Content Analysis, says “The new API gives developers actionable metadata about their content. By extracting and then ranking terms on the page, the API separates signal from noise in unstructured content.”

As an added bonus these extracted entities are matched against Wikipedia articles.  Because the Wikipedia URL is unique, it can act as an identifier for the entity.  Levinson notes that the use of wikipedia IDs provides value, “by giving developers the Wikipedia IDs for most of these terms, their content becomes even more structured for those entities since now they can use those ids to build relationships among their documents.”

My big questions is how do they do it?  Semantic analysis is a popular area now, but those services are powered by Artificial Intelligence.  Is there AI at work here?  Is the Yahoo! approach some how simpler than that?  I’m hoping that we can catch up with Yahoo! to discuss a few of the ingredients that compose secret sauce behind entity recognition, ranking, and matching against Wikipedia.

This feature is part of the YQL platform which, I have to admit, I only recently discovered.  Maybe I’m a bit late to that party, but I’m interested to see where they take it.  It reminds me a lot of what our friends over at Datafiniti are doing.

Semantic analysis would be difficult to perform on one's own, so it makes perfect sense to build those smarts into your application via API. This new offering from Yahoo provides valuable analysis and is a great way for Yahoo to endear itself to the developer community.

Be sure to read the next API article: Solution for IMAP Headaches Out of Beta