New Diffbot API Client Libraries Released

Janet Wagner, Data Journalist / Full Stack Developer
Feb. 07 2014, 07:55AM EST

Diffbot has just announced the release of brand new client libraries for 35+ different programming languages. The company now provides developers client libraries for the Diffbot API in the most-used programming languages as well as languages that are not as common.

Diffbot uses computer vision, machine learning, natural language processing (NLP) and other technologies to create APIs that are capable of understanding and extracting data from web pages such as text, images, links, HTML attributes, e-commerce product page information and other web page elements. Diffbot currently has three core products: Automatic APIs, Custom API Toolkit and Crawlbot.

Diffbot

Last August, ProgrammableWeb reported that Diffbot had launched the Product API, which can be used by developers to extract product information (product title, description, sale price, regular price, UPC, etc.) from e-commerce web pages. The extracted product information can then be integrated into third-party applications. The Product API is included in the suite of Diffbot Automatic APIs.

Releasing officially supported and maintained client libraries solved several problems and provided key benefits:

  • It prevents users of third-party contributed libraries from running into buggy, non-maintained code.
  • It allows Diffbot to control the release of new updates and bug fixes.
  • It allows Diffbot to provide client libraries with clean code and adequate documentation.
  • Diffbot API client libraries are now available in nearly every programming language.
  • The API client libraries work with the programmatic Crawlbot and Bulk-submission interfaces (for premium users).

There is an additional benefit, says Diffbot CEO Mike Tung: "Diffbot is a REST API and that means it can be called from nearly any kind of software environment or programming language. This makes handling all the potential support questions tricky to say the least," he says. "Now with official supported clients in every programming language, we can point integration questions to example working code."

Although developers are not required to use a Diffbot API client library, using a library can save developers some time when it comes to writing code. Each library contains an already-written Diffbot API call, and there are now client libraries for 35+ different programming languages. Available programming languages include C#, CoffeeScript, Java, JavaScript, Objective-C, PHP, Python and Ruby.

"A developer generally writes an application in a required programming language / environment; if he is writing an Android app, then he's using Java. If he's writing an iOS app, he's using Objective-C, if he's writing a plugin for Excel, he's probably using VBA," Tung explains. "They don't have to use a library, but it makes it easier because the call is already done in their language. An analogy might be to think of these libraries like templates for Word."

Diffbot was able to complete this project in a time-saving and cost-effective way by using the oDesk API to post jobs for each individual programming language. The company posted the jobs on oDesk which as a group received thousands of responses from developers worldwide. Using developers found on oDesk resulted in the completion of 35+ new client libraries for the Diffbot API, only 18 hours of Diffbot's own time spent on the project, 56,042 new lines of code written, and an increase in the Diffbot developer network to approximately 10,000 developers.

Related searches from ProgrammableWeb’s
directory of more than 10,000 APIs

Browse The Full Directory

Diffbot has come a long way since the company was founded in 2009. The news story was the first type of web page that Diffbot was able to parse. Today in addition to the News Article API, there is the Frontpage API, Product API, Image API, Page Classifier API and other web page data extraction APIs and tools.

In a recent article published on Xconomy, writer Wade Roush asks "Could a Little Startup Called Diffbot Be the Next Google?"

It may not be long before that question is definitively answered.

By Janet Wagner. Janet is a data journalist and full stack developer based in Toledo, Ohio. Her focus revolves around APIs, open data, data visualization and data-driven journalism. Follow her on Twitter, Google+ and LinkedIn.

Janet Wagner Janet is a data journalist and full stack developer based in Toledo, Ohio. Her focus revolves around APIs, open data, data visualization and data-driven journalism. Follow her on Twitter: @webcodepro and on Google+

Comments

User HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.