Semantics3's Approach to Building the 'Best' Product Lookup API

A good API requires a lot of things. For instance, a good API is well-designed and well-documented. But a great API requires more than solid API implementation. It also needs data or functionality that solves challenging problems.

Acquiring that data and delivering that functionality often necessitates creative approaches on the part of API providers.

A good example of such a creative approach can be seen with Semantics3, which provides a suite of product and pricing APIs. The company's flagship Products API gives customers the ability to retrieve information about millions of products. Customers can search for products using UPCs and bar codes, which in and of itself has created significant headaches for Semantics3.

Another source of headaches for Semantics3 and its customers: Despite the fact that the company claims to have "created the world's largest database of distinctly classified products," there are times when it simply doesn't have product data to return for a UPC/bar code query.

Instead of returning a response that indicates as much, however, Semantics3 decided to build a crawler that retrieves product information from the Web in real time if none exists in its database: "If an initial UPC query does not return results from [our] vast database, a secondary, web-based on-demand crawl is activated. If that UPC is available online, our scrapers go into action and mine data from any website that is likely to have that UPC."

According to Semantics3's CEO, Anand Ramachandran, "Since we power many applications that use our data in real time (bar code apps, product data for mobile applications, etc.), this was a critical need."

Creativity Creates Technical Challenges

Semantics3's effort to return data whenever possible by adding a Web crawler to the mix is a no-brainer from a customer perspective, but it does create a number of technical challenges.

"When we return data for a UPC from our database, the product data is very rich, with results having up to 50 fields of information (as the product data is aggregated from across retailers). In the expanded UPC approach, the results are usually from a single or limited number of sites. Hence the product data is not as rich," Ramachandran explained. "To make up for this, we have taken a feedback loop approach — we evaluate all the queries that have required a search beyond our database. We then selectively add retail websites that are a source of a large number of relevant UPC results to the list of websites we track in our database."

Real-time crawling also creates performance challenges. "We use extensive heuristics to minimize query search space and reduce unnecessary searching/crawling. This helps us retrieve and package all the data in the quickest possible way," Ramachandran told ProgrammableWeb. "The average response time is 4 seconds for the expanded UPC search, which is reasonable for most applications."

Semantics3 bills its tech as "epic stuff," and while not every API provider has a need to build out a Web crawler capable of delivering on-demand data for API requests, the company's approach highlights the fact that API providers should always consider customer use cases and look for opportunities to improve overall API experience, even if doing so requires addressing tough technical challenges.

Patricio Robles Follow me on Google+

Comments