Diffbot's Discussions API Provides Comment Section Searchability

Diffbot, a computer vision, Machine Learning and natural language processing startup, today unveiled a new Discussions API that provides developers with access to conversations taking place on the web every day through comments, forums and reviews.

According to the company, its new Discussions API "allows developers, branding executives, and media monitoring companies to monitor every conversation on the Web the same way they monitor Twitter." A host of social media monitoring platforms have firehose access to popular social networks like Twitter, and have built solutions that enable companies to track and analyze the chatter on the those services.

But Diffbot, which says its mission is "to bring structure to the unstructured Web," believes that the millions of conversations taking place across the web in comments sections and forums every day are often just as valuable. So it built the Discussions API to give developers the opportunity to build applications to more easily search and obtain insights from this content – content they currently have little to no access to.

Diffbot indexes data from a variety of comment platforms, including Disqus, Livefyre, Wordpress, Blogger, Intense Debate and Kinja, no small feat given that some of them are JavaScript-based, making them more difficult to crawl. It also picks up content from popular forums like Reddit, as well as reviews from Amazon and other retailers. Diffbot's technology automatically identifies the structure in the data it crawls, including author, date and discussion content.

Documentation for the Discussion API is now available in the Diffbot Developer Portal, and developers can use Diffbot's testdrive console to test the Discussions API against a URL of their choice.

Picking up Google's crumbs?

Despite the undeniable success of popular social networks like Facebook and Twitter, some still question the value of user generated content. The signal-to-noise ratio isn't very good, they argue. That might be one of the reasons why the world's largest search engine, Google, shuttered its own discussion search product.

At the time, a small but vocal group of users complained, but Diffbot CEO Mike Tung doesn't believe his company is trying to realize an opportunity Google missed. Google's product "was a consumer functionality," Tung told me. Google was "never in the business of providing web data to developers/business users."

Specifically, Google's offering was also far more limited than Diffbot's. According to Tung, Google didn't look at comment platforms, and its coverage of forums was far narrower. The search giant's offering was also far less sophisticated in terms of structured data. With Diffbot's Discussions API, users can build smart queries, like "give me images from forum posts made in January that mention apple watch," making it possible to identify interesting and potentially valuable content.

Diffbot's Discussions API also helps developers separate the wheat from the chaff. "Developers can point the Discussions API at whatever sites and use whatever search terms they think will provide them with the best results," Tung explained. "The Discussion API visual extracts the number of 'votes' of a comment (or likes/agrees).  This provides a bit of social signal, similar to how a human would determine the best comments in a big list. We also extract the positive/negative sentiment of each comment, which can be used for sorting/filtering."

Diffbot already offers a number of APIs and Tung tells me he's very excited to see how its newest addition to the Diffbot API suite is used. "We've always been constantly surprised by what developers do with our tools – they are more creative than we are," he said. As 2016 nears, Tung is particularly interested in seeing how users employ his company's API for trend and sentiment analysis related to the upcoming presidential election cycle.

Be sure to read the next Extraction article: Daily API RoundUp: Shodan, Firepad, Voice Republic, Flyr, Sensorist