ProgrammableWeb recently spoke with Scott Ge, CEO & Co-founder at Smartable AI, to discuss the company’s COVID-19 Stats & News APITrack this API and the challenges related to creating such a resource. The API provides developers with access to data that is validated regularly via data scraping and a custom algorithm.
Any API is only as valuable as the quality of data that it provides. With the current Coronavirus outbreak, aggregating quality data is especially challenging. Scott Ge highlighted an instance where changes in the data provided by a source represented a breaking change:
“Breaking changes or data issues are very common in many data sources. As an example: Johns Hopkins’ data uses country name, state name and county name to uniquely identify a location. They change country names very often. A few weeks ago, they completely stopped providing county-level data in the US. I’m one of their very early users. They were my main data source at the beginning, but I really suffered from those breaking changes. So I decided to stop relying on them, by taking a lot more data sources and using AI and other technologies to cross-check and validate.”
Ge then explained that he created an algorithm to deal with geographical name changes in the source data and to flag abnormalities. The full process for data validation, as Ge explains, goes something like this:
- I have a long list of county-level government page that publishes their data. For example, https://www.doh.wa.gov/emergencies/coronavirus has US WA county data. My crawler regularly scrapes ‘data candidates’ from those pages. One issue, however, is that these government pages only update once a day. So the data are still often delayed.
- I crawl and analyze news headlines (of course from credible news sites only). If you noticed, one of the APIs we provide is the news API. The actual news we crawl is 10x more than what we eventually filter out for the API. We use natural language processing to understand whether the news is announcing new cases in a certain location. If yes, the number in the headline will be extracted as the candidate.
Additionally, Smartable AI utilizes data from sources like Wikipedia to further validate for accuracy. With Smartable AI providing access to this API for free, Ge hopes that developers will be able to create data visualizations and tools without needing to work through the myriad breaking changes that he faced. Make sure to check out ProgrammableWeb’s recent coverage of the API for more detail. Smartable AI is also in the R&D phase of future projects including a fact check API.