The U.S. Food and Drug Administration’s efforts to digitize its data came to a head in recent weeks with the launch of an API that provides product labeling information for all drugs listed and overseen by the regulatory body. The product labeling API joins the other two recent data sets opened via API, adverse events and drug recall enforcement reporting. Together, these represent the triumvirate of key data sets managed by the FDA and are available for testing in new healthcare and scientific products by private and social good enterprises.
The approach is being heralded as a model for how government departments should manage their API strategies. While the release is still in beta — and API consumers are encouraged to be cautious in integrating the products into their business models — the potential impact of making this data supply available programmatically is profound, with at least one incubator encouraging startups to consider how they would build products from such open data availability.
“By lowering the barrier of entry to using some of the most valuable data sets held at FDA, our goal was to make it easier for startups and innovative thinkers to use FDA data to educate consumers, further FDA's regulatory and scientific missions and save lives,” says Sean Herron, one of the chief architects of the team that created openFDA. Following the conclusion of Herron’s role as a Presidential Fellow working with the FDA team, he has since moved on to work at the General Services Administration’s entrepreneurial-like lab, 18F.
He insists that not only is the work continuing at FDA, but that it is in the capable hands of the team he left behind: “In addition to myself, there's a great team of developers and designers we brought on board as well as a talented group of folks inside the FDA at each center that provided data. OpenFDA wouldn't have been possible without everyone putting in a ton of amazing work.”
Herron credits the executive order on open data issued by President Obama as being the policy springboard that has enabled action across government agencies, including at the FDA. This led to FDA’s initial work on opening up its data to external parties.
“In particular, openFDA does a lot of work in linking up data sets across the agency, such as connecting product identifiers in adverse event reports to the associated enforcement report and labeling information,” says Herron. “By doing this heavy lifting up front, it becomes much simpler to analyze the life cycle of a product in the marketplace and to derive insights that can be tailored to specific patient requirements."
Government API Data Supply to Create New Startups
Herron points to the opportunities for external developers:
There are a lot of potential applications here, such as applications that providers (such as physicians and pharmacists) can use to get up-to-date labeling and adverse event information directly from an official FDA source or applications geared for consumers that enable them to know more detailed information about the medications, foods or medical devices they use or take.
OpenFDA creates a platform that makes it easy to integrate on top of. FDA can't create every possible application for everything, and by empowering individuals and businesses to use FDA data in their work, a lot of opportunities are created for both economic growth and further fulfillment of FDA's mission.
Ian Calvert, senior data scientist at startup incubator Digital Science, agrees:
It's great to see the FDA opening their data like this, rather than trying to build their own Web interface, as it mirrors the separation of concerns we've learned is so important when building software. The FDA know about their data; that's their domain. There are then experts in vast numbers of fields who can use their domain knowledge to build specific, useful tools to solve the problems they actually face. Instead of seeing one app or website the FDA produce, we'll see hundreds, many of which you or I would never have seen coming. Some of those will likely become so ingrained in what we do that we'll find it odd to think of a time when they didn't exist.
As each new open data source appears, the number of possible ways of combining them increases drastically, and that's where we see the most interesting developments. It's not about just throwing a fancy interface over the data, it's about linking it to other sources to give insights into the real world we couldn't get before. That core idea drives a lot of Digital Science portfolio companies, bringing multiple data sets together to make something truly new and hopefully improve how we do science.
Herron is excited about the potential new products that can be created by having the data available via API:
Analyzing data at scale, especially when trying to link various data sets together to get a better picture, has traditionally been an extremely labor- and time-intensive process. Many of the data sets openFDA focuses on were previously only available in XML files that needed to be reconstructed in to a relational database, as HTML tables with no structured metadata or worse. The level of effort needed to just get actionable data out of those formats was so high that many opted to just not use them at all.
Queries that previously could literally take months of work can now be done in under a second with a single API request. When coupled with the extensive documentation and active community openFDA has created, we've empowered developers to focus on the important tasks, such as creating a statistical analysis methodology, and not on cleaning up or restructuring messy data.
Finally, as applications can now get data from FDA in real time rather than needing to bulk download and reupdate on a defined timeline, it becomes much easier to provide updates, as the application developer doesn't need to do anything special to get new data. They can simply be included in results as they are requested.
A Model for Government API Implementation
OpenFDA's approach to establishing the API strategy has been heralded as an exemplar that other government departments can follow.
Above: OpenFDA’s documentation includes examples of how to show the data as interactive data sets as well as displaying the query methods that are used to produce such results. Developers can test the query parameters directly on the website.
18F’s “All the X API Resources” page gives seven reasons why the model should be replicated across other departments:
- “Interactive and useful documentation.” Each of the three openFDA APIs includes interactive documentation that allows developers to quickly gain valuable insights into how the API works and how to perform queries. Charts, mashups and other supporting documents are provided to demonstrate the API functionality and to help developers see how to bring the data results to life.
- “Great open source documentation.” 18F notes that the entire site, including the interactive documentation, is provided as an open source project available to fork from GitHub.
- “Front-facing feedback mechanisms.” Developers are encouraged to engage with and contact the openFDA team at any time via GitHub, StackExchange or Twitter.
- “Strong development process.” While the three APIs are still in beta mode, a prerelease beta program strategy worked with external consumers and early adopters to test initial elements of the API release.
- “Continued growth.” This is not a set-and-forget API release. The team continues to work on improving the API products and adding new endpoints.
- “Solid terms of service.” 18F proudly points to openFDA’s work on building a “much more usable, developer-friendly terms of service.”
- “Well-rounded developer experience.” Developers visiting the openFDA portal will find similarities with developer portals from some of the best private API providers. Code snippets, documentation, sandbox access, tutorials and other self-paced learning assets are available from openFDA.
Industry Ready? Two Remaining Concerns With the Beta Release
Despite these strengths, the initiative has its share of critics, and the concerns are strong enough that any private enterprise using the data will need to assess the potential risks of using a beta API product thoroughly.
Above: OpenFDA makes it clear to website visitors that the API is in beta release.
Brian Overstreet from healthcare informatics company AdverseEvents is cautious about whether businesses should be integrating the beta release into their products just yet. He points to disparities that exist in the data harmonization process, which is still seeing a high number of relevant cases being missed. He gives the example of one common cholesterol-lowering drug, Lipitor: The openFDA API for adverse drug events shows 75,325 events, while AdverseEvents’ own drug database built from the same source data (but with its own internal data cleaning) returns 103,110 cases. According to Overstreet, developers on GitHub are beginning to see this sort of discrepancy emerge in their own data queries.
“There are literally dozens of different identifiers and labeling conventions out there for products,” Herron explains. “As part of openFDA's development, the team created an extensive mapping scheme that enables openFDA to map any one identifier to a wealth of other identifiers and return them all, meaning that results can be queried by the identifier the user is most comfortable with. This can't be done with every record, as sometimes it's difficult to even pull one identifier out. One of the benefits of being open source is that the methodology behind this process is entirely available online. I know the team is working to provide as much identifier coverage as possible and through open sourcing have also enabled the community to help out and provide feedback, ideas and code as well.”
A second concern mentioned by Overstreet is some slowness in releasing current data. The raw data that powers openFDA’s adverse drug events database is from the FDA’s Adverse Event Reporting System (FAERS) data set. Overstreet notes that there is a significant time lag between when data is released under FAERS and when it is added to openFDA.
“Posting times on data within openFDA [are] highly dependent on the individual data set at hand,” explains Herron. “As I'm no longer working on the team, I can't speak to specific release timelines. In general, however, data goes through an extensive process to be screened for personally identifiable information and other sensitive information. This is largely done by hand and can take time. Additionally, in the case of some data sets, openFDA receives access to the data at the same time as the raw source is provided by FDA, meaning that any changes or updates will inevitably have a delay. The platform is in beta, and hopefully these specific concerns can be addressed as the openFDA evolves.”
Startup Investment Opportunities Built on openFDA
Despite these concerns, the work of openFDA continues to forge ahead and the available data sets, once robust enough for industry use, are expected to quickly be integrated into products.
Calvert, from Digital Science, is optimistic that startups will be able to work with openFDA to address the data concerns and use the API access to create new products. He told ProgrammableWeb: “I look forward to seeing surprising new startups using the openFDA data coming to work with us in the future, either through a Catalyst Grant or as a portfolio company.”