Open data startup Enigma has released its platform to the public and now provides the open Enigma API so that developers can begin drawing on open data sources and feeding them into their applications. Co-founder Marc DaCosta spoke with ProgrammableWeb about the importance of encouraging developer involvement, and why Enigma can succeed when other open data platforms have had difficulties building a sustainable model.
Enigma is an open data platform with current access to almost 70,000 datasets, from U.S. and international sources. New datasets are added almost daily, and users can easily request datasets to be added. As a startup that's already won several industry awards and recently secured $4.5 million in funding, Enigma has focused on supporting its corporate clients in key industries in order to build the platform. Now it's ready to make the platform available to the public, and anyone can register for a free account or enter a paid plan which provides access to the Enigma API.
“We have relaunched the enigma web app, from an enterprise-scoped product into a public platform that anyone can go to interact with and build apps off of,” DaCosta told ProgrammableWeb.
Open data for the public
DaCosta explains the motivations behind opening up the platform to wider usage:
We started thinking more deeply about what position we want in the open data world. It has been fascinating to watch open data as a concept take shape over the past five years, when it has grown with a lot of different stakeholders involved. But [it's] still a very uncharted, fertile, and unfolding territory to be involved in.
There has been great progress—as was recently demonstrated by the OpenData 500 launch, which showed how companies are using open data sources to feed their product development and business processes across industry. However, there are also many more opportunities that have not even started to be realized.
“It is exciting because we really believe that in five to 10 years down the road, we will be in a world where you will be able to look at a building, and see the data signature (who is it owned by and what else it is involved in), as you would use Wikipedia now to look up Crimea to learn more about its natural resources, for example,” DaCosta says.
API access to open data
Part of opening up the platform has included making an API available. While free account holders can test the API in a sandbox environment, paid subscribers can make up to 50,000 API calls a month.
“The way that the API works is that anyone gets basic sandbox API access to prototype a use for it,” DaCosta explains. “Once it gets to a commercial application stage, we have a basic developer pricing level. Users have these sandbox API keys so developers can use these directly in their app prototypes. Our opening gambit is in terms of building a large community around open data. So the sandbox API is very important to that.”
Several REST APIs are provided, enabling access to dataset metadata, the data itself, a stats API that allows developers to perform analysis such as calculating averages based on variables in the data, and an export API that allows developers to download datasets in CSV format.
Documentation provided includes brief descriptions of each API and the parameters and attributes of each, as well as an interactive screen that shows an example data request or response, and a sandbox tab where users can test API requests themselves.
Opportunities for developers
Da Costa is committed to focusing now on fostering developer communities and encouraging developers to “harness this data to do interesting things.” To that end, Enigma has focused on enabling app developers to immediately begin making use of any data they need from the stores of datasets on the platform:
If you are an app developer and you want to help consumers make more informed choices, you don’t necessarily need to understand all the datasets that might help you do that. Instead, you can start with the manufacturer of a product and find out all the datasets related to that manufacturer and work from there.
You don’t need to worry about data-as-a-service type layer, you don’t have to manage all the datasets that power your apps.
At times the interface can still be confusing for newbies, but Enigma is structured in a way that means you do not need to know what datasets exist that may help you. Users can search via location, keywords or company names to surface all datasets where those attributes are mentioned, then work backwards to drill down into what datasets may be most useful for a query. Once a dataset is filtered to show just the items wanted, it can be saved or exported and then a similar query can be performed on another dataset.
Managing data quality
One of the strengths of the Enigma platform is its focus on cleaning and organizing data before it is imported. Unlike platforms like CKAN, which is used by governments and cities around the world to publish their data, Enigma polishes its data first so all datasets are available in a clear table format, and have been formatted in JSON to be available for request calls via API. CKAN, on the other hand, lets the data publishers decide, which means data is uploaded in multiple formats: from CSV, to PDF, to spreadsheets, to API. It is up to the developer-consumers to wrangle the data into the format they want to use.
Enigma solves this problem by having a highly curated element to its business model. “There is a lot of data quality issues when you are dealing with data.gov for example,” DaCosta says. (Data.gov is built using CKAN.) “It is not always done in a way where the data is in a usable format. To have a platform where the data is easy to surface and to use the data in the first place will help with the velocity of open data, it is much more within people’s reach and ability to discover.”
Enigma also provides a number of features to encourage data accessibility: a roadmap feature lets users identify datasets they want to have added, and a progress status is providing informing of the potential timeframe for the date sets to come online.
Four use cases for open data using Enigma
1. Investigative journalism
“We have worked with some not-for-profit and investigative journalists to further civic media and public good projects,” says DaCosta, when asked about how developers can gain free access to the API. DaCosta cites examples such as the connection of shell companies, and the “ability to follow supply chains and see where U.S. military uniforms are being manufactured, for example.”
2. Real estate investment
“We are working with a lab at MIT that is using the public Enigma API to power visualizations to create portraits of cities: with real estate assessment values, hospitalization intake causes, things that assess the safety of local communities, etc., all data sources that together create powerful tools for civic engagement,” says DaCosta.
Some developers are using the data to identify quality building contractors, while others are using building permit data to identify energy-efficient building designs, and therefore property investment opportunities that will have greater value in the longer term: “You can take building permit information filed by building contractors, and then look at who are the good contractors, for example,” suggests DaCosta. “There is a group up in Boston that is using the import data we have to look at the carbon scores of the properties they are buying.”
3. Assessing business risks
In an interview with Riskpulse earlier this year, CEO Matthew Wensing mentioned the need for accurate data sources that may help identify specific business risks such as delays to manufacturing caused by port labor disputes. DaCosta sees a day when Enigma may have those datasets available in the Enigma platform:
Right now, you could look at the raw volume of containers that are being held up. There are also datasets from Federal Highway data that will tell you how many trucks are distributing goods, there is also a dataset called the Warren Act where any employer with more than 200 employees has to report any employee layoffs. Also from local manufacturers, you might be able to look at occupational safety data to see if they have had any labor violations, and you can [look] into government contracts and see if a disruption could impact on a manufacturer’s ability to deliver on government contracts.
4. Identifying food sustainability opportunities
ABOVE IMAGE: Detail from infographic on food waste by A to Z Solutions
In the US, it is estimated that 40% of all food ends up being thrown out due to inefficient distribution across the country. In the future, analyzing datasets on Enigma could help find new opportunities to manage food distribution more effectively. “Corporate registrations identify delis and supermarkets,” DaCosta points out. “You could then mash up that data with city health inspection data” as one avenue. There are still a lot of opportunities to mine public data to identify where food gets wasted or where in the chain it goes off, and how to create alternative distribution pathways to make better use of this oversupply to areas with greater demand for more accessible food options.
Barriers to entry
Although Enigma holds a lot of promise, its current subscription price of $395 per month may still be beyond the reach of many early adopters wanting to draw on the wealth of open data for their business cases. The cost savings from easing data discovery (by reducing the need to research potential available datasets themselves) may still be too steep to make the case for using the Enigma platform unless developers are planning on producing a lot of open data-enabled products or tools, or are working in a highly competitive enterprise market, where deep pockets on early adoption of using open data may have strategic business advantage now.
Developers and businesses will need to have clearly thought through what data they will need to access, analyze and integrate in their end use cases before they commit to a subscription.
Beyond the savings in data discovery that the platform provides, the 50,000 calls a month is a fairly small number of API requests to be able to be made when accessing datasets that will also need to be further analyzed before displaying results in the end user interface. For example, in the use case above of real estate investment, there are 92,900 rows of data for building permit applications in New York in 2013. So if a user is looking at data from a number of years, across a number of locations, it will be fairly easy to hit the 50,000 calls a month quickly. The developer-consumer pays for access to the data and to mining it from Enigma via the subscription price, but then also needs to evoke a number of other tools such as data analytics to ensure that the data is comparable across different jurisdictions with different per capita population rates, for example. The developer-consumer will then need to use visualization and reporting tools to create the end result that is shown to customers or used internally to help make better business decisions.
Enigma’s greatest use for developers may still be amongst those who are thinking several steps ahead and want to play with new ways to discover data sets and practice filtering data into usable chunks. From there, the API may help identify how a developer could automate such data queries. But to move to a commercial application, developer-consumers may need to have a clear business case in mind with good projections as to the value of drawing on open data sources via Enigma before they are willing to pay the subscription fee involved.
By Mark Boyd. Mark is a freelance writer focusing on how we use technology to connect and interact. He writes regularly about API business models, open data, smart cities, Quantified Self and e-commerce. He can be contacted via email, on Twitter or on Google+.