A book about open data, to be released in January—and the launch of a pilot website showcasing open data businesses—both forecast a big year for open data in 2014, and highlight the importance of APIs in powering this new wave of innovation. ProgrammableWeb spoke with Joel Gurin, author of Open Data Now, and reviewed some of the businesses that make up the preliminary Open Data 500 list.
"It's a very exciting time for open data," says Gurin. "One of the premises of open data is that you can build a viable business on free, open data. Entrepreneurs are finding new uses for data that has been around for years. And there are a number of companies already using open data. In five to 10 years, companies using open data will be the rule and not the exception."
Open data business models
In conjunction with the publication of his book, Gurin has also been spearheading a project that makes use of his research into open data companies. The OpenData500, being managed by NYUGovLab, aims to demonstrate the viability of open-data-powered businesses. An initial list of businesses has been published on the website, with more details to be added once a final list has emerged.
"There are a number of revenue models. OpenData500 is a study asking these businesses about their revenue models. So far, the most prevalent is a paid service model like Climate Corporation. They are an open data success story, with over 200 employess in several cities and have built a billion dollar company that started by using free US government data sources.
"There are other models. Enigma.io recently won an achievement competition and are figuring out how to take open data and make it truly useful. They are selling information to hedge funds.
"There are probably some who will make a business based on a subscription revenue model.
"The most robust revenue models seem to be analysis services to various sectors. Those businesses seem to have the data that is most valuable: open data mixed with proprietary data."
APIs to leverage viability
As businesses begin building new models based on using open data as a raw material, Gurin expects APIs to be central to the scalability of this wave of innovation:
"API development is critical to the usefulness of open data. We see this more at the Federal level at the moment, but we will definitely see a movement [in 2014] around using APIs and open data. Federal policy is that open data now be machine-readable. What we are hoping to do with the OpenData500 is to help prioritize what datasets are most useful to make machine-readable." [Writer's note: When governments start publishing data openly, they tend to release the data in whatever format it came in. This has been the case in the United States and across Europe, giving rise to civic hackathons which aim to help unlock open data stored in PDFs, for example.]
"It's also about figuring out which of the non-government sources are most significant. We're seeing a lot of use of social media open data, for example."
Key sectors for open data in 2014
Gurin points to a number of key sectors where he expects to see the greatest advances using open data in 2014.
Health: "We're seeing tremendous activity around health right now. Competitions like the healthapalooza events are bringing in thousands of people to make use of health data sources. But there's a lot of sorting out to do about what is open and what is private and protected data. We are seeing datasets open up around cost of care, quality of care, accessibility of providers. The opportunities also [exist for] people to collect data about themselves [i.e., Quantified Self] ... there's going to be explosive potential there over the next couple of years."
Energy: "Energy is one of the most interesting sectors [making use of open data]. Services like OPower are finding that the value of open data is more useful when it is coupled with an end user's own data. They find that being able to compare your energy usage against your neighbors' is a more powerful motivator for change than the environmental benefits or cost savings."
Precision agriculture: "The Climate Corporation is quite a remarkable example—they began as a company selling weather insurance but in order to do that well, they had to go so far into the data they now understand agriculture and can help improve the viability of farmed crops."
Financial sector: "This is an increasingly interesting sector for open data. Data around the sustainability and environmental footprint of companies is beginning to play into investment decisions. Usage of XBRL [eXtensible Business Reporting Language] and data about publicly funded companies, sites like Duedil in the UK and OpenCorporates, are looking at corporate transparency. There is a lot of movement towards opening up disclosures of social and environmental impacts. It's really bringing about major changes. There's a lot of opportunities for analysis and insight."
Data journalism: "Newspapers and print media have been battered so extensively by the internet so they have not been able to invest in open data as much as we expected, but data journalism is poised to take off. People in their 20s are interested in the new tools available, as young people have less trust in established media. The Guardian has been a major example, not just as a data driver but [also] in crowd-sourcing data, it's an interesting journalistic model. In the UK, I gather, it's now a national pastime to look into your local MP's expenses, for example."
Gurin believes the big potential for innovation in 2014 comes from the intrinsic nature of open data:
"Barriers to market entry are low. By its very definition, open data is free and accessible. But there are a couple ... I wouldn't call them barriers, they are more like challenges. You need to be creative with open data. To be competitive, you need to be able to uncover new uses for open data and mix sources together in new ways. Very advanced analytics skills are at a premium, that's how you add value to the data. It's a ripe field for innovation. It's an area [that] will test the creativity of anyone in the field, as you are using the same tools that anyone else has access to..."
Open500 examples and API usage
The following businesses—sourced from the OpenData500 website—are built on use of open data, and in turn, make their datasets available for remix and reuse via APIs:
Archimedes (Sector: Health)
This company has created a decision-support tool using healthcare data and clinical research data. In turn, it has built an API to help developers integrate best practice guidelines on diabetes management into customer-focused health, fitness, clinical support and lifestyle apps.
Captricity (Sector: Open data tools/research)
This company uses a variety of OCR and parsing tools to allow customers to make better use of private and open source documents. Customers scan or send in documents in any form—handwritten, PDF, typewritten or online—and Captricity applies its tools to turn source documents into machine-readable datasets for access via an API.
Civinomics (Sector: City management)
Civinomics allows local governments, utilities, companies and other authorities to share source documents such as local government policy papers and to create local campaigns aimed at encouraging residents to vote or to crowdsource opinions on major civic works or local policy decisions. While the terms of service for the online site recognize that customers' data can be channelled via API, developer documentation could not be located on the website at the time this article was written.
Enigma.io (Sector: Open data tools/Research)
Enigma.io aims to provide an open data platform that allows end users to augment publicly sourced data with privately held sources to create unique, commercially advantageous analysis and business intelligence. Enigma.io also offers services to help customers locate potential data sources and provides an API to allow direct access to a customer's datasets/data account.
Factset (Sector: Financial)
Factset collates open and privately held financial and business market data to create commercial insights and business intelligence analytics. Data is available to customers via an API as a real-time data feed, a market "snapshot" in time or for historical analysis.
A pilot version of the OpenData500 website is currently online. Open Data Now will be published on January 10.
By Mark Boyd. Mark is a freelance writer focusing on how we use technology to connect and interact. He writes regularly about API business models, open data, smart cities, Quantified Self and e-commerce. He can be contacted via email, on Twitter, or on Google+.