The smart cities agenda is transforming how cities manage their infrastructure and how they communicate with citizens and local businesses. Open data and the use of sensor technologies to provide real-time feedback on the use of urban infrastructure are two components at the center of what is considered a “smart city.” How are developers using public transport APIs to empower a smart cities agenda and what is the progress of city authorities looking to make their transportation systems smart? ProgrammableWeb surveys some of the latest advances and reviews city progress in opening transport data via API.
How to Read This Article
This in-depth analysis of public transport APIs is the first in a series looking at the opportunities for developers targeting the smart city market. Developers can jump to specific sections to review:
- The role of public transport in city life
- Types of public transport APIs and what data is being managed
- Data aggregation business models
- Transport-related products developed with public transport APIs
- Hyperlocal products and services developed with public transport APIs
- Challenges and future opportunities
Public transport makes a city smarter
Cities around the world are opening up their public transport data to enable third-party developers to create new commercial and social good products. Public transport is often a good starting point for cities looking to open up useful data sources as part of a smart cities agenda.
How public transport data is opened — and the role of APIs in this process — demonstrates the potential that comes from cities opening up their data:
- It is an immediately useful data source that enhances participation in city life.
- It can add value to contextual and personal data.
- It has a real-time component.
- There are commercial revenue opportunities across a range of industry uses, with innovative business models that can be applied.
- It often involves multiple stakeholders and, therefore, requires aggregating of data from multiple sources.
What do we need to know?
For governments, there are Resource implications to providing public transport data via APIs. For developers, there are commercial opportunities from building a successful data-driven business that leverages public transport APIs. And for the community, there are both enhancements to city life and frustrations to social participation that emerge from public transport API supply.
ProgrammableWeb lists 293 transportation APIs, of which 83 are related to public transport (either APIs from specific cities releasing their public transport data or aggregate service providers offering APIs for route planning or mapping).
The U.S. City Open Data Census conducted by Code for America, the Sunlight Foundation and the Open Knowledge Foundation identifies only six of the 43 U.S. cities surveyed as making open transit data available.
ABOVE: Scores of the six cities sharing open data around city transit services, according to the U.S. City Open Data Census
Pieter Colpaert, coordinator of the Open Transport Working Group of the Open Knowledge Foundation, explains the international perspective on how cities are opening public transport data:
“It's very early days. It's very difficult to get all noses pointed in the same direction, as in most cases it's not only one organization that has to decide (for example, there is the government, the railway operator, the railway maintainer, etc.).”
The importance of public transport in city life
Public transport is an essential service in any city, enabling citizens and visitors to move around for education, to attend health and service appointments, to go to work, to shop, to visit friends and family, and to fully participate in the activities and opportunities of city life.
Public transport accessibility is at the center of a livable city and is a linchpin of local engagement and social inclusion. Many cities offer a variety of transportation options, including buses, rail, light rail and trams, and ferries. Increasingly, car and bike rental schemes are also been introduced as part of a diverse public transport network by city authorities (although bike rental service APIs will be covered in a separate article, while car rental schemes will be explored in a future article on traffic congestion APIs).
Along with offering a range of transport options, various funding models are used to resource public transport infrastructure. Current public transport delivery models include contracting of private transport operators, government-delivered services, a mix of private and public service delivery, and even agreements for limited private service delivery using public infrastructure. (For example, in San Francisco, the city authority has come to a financial agreement to let the so-called “Google buses” use existing bus stop infrastructure for collecting and disembarking passengers traveling to work at tech companies.)
Navigating a complex web of trains, trams, buses and ferries managed by multiple operators can quickly become confusing for residents needing to get to work, to health and service appointments, and to recreation and cultural activities. For visitors wanting to move around a city’s key districts or to enjoy landmarks and tourist destinations, local travel can increase their safety risks (lost tourists are an easy mark) and add to their travel expenses (there’s only so much travel budget that can be set aside for Uber and taxi fares).
APIs lend themselves well to managing the challenges of mapping transit routes and factoring in real-time data on desired destination; estimated travel times; pricing; and details of last-minute route alterations, traffic congestion and service cancellations.
Types of data included in public transport APIs
Data available in public transport APIs can include the following:
Transport route details: Transport service numbers and destinations, transport operator, travel route details, start times, journey durations, estimated arrival times at each stop, and more
Infrastructure geolocations: Bus stop and train station locations for each direction of travel
Real-time data: Last-minute cancellations, route alterations and maintenance issues, up-to-date estimates of arrival times
Ticketing procedures: Fare estimates, confirmation of ticket prices, and the ability to book tickets via API
These may be provided as separate APIs (sometimes data such as the infrastructure geolocations are provided in other formats like CSV files) or as the one API with multiple resources and services.
Three types of public transport APIs
Colpaert has come up with three categories of public transport APIs: “Today, I believe there are three types of public transport APIs, based on who created the API,” he explains.
1. The first is created by volunteers who just scrape the data. As the API creators are also the reusers, the format and vocabulary for their responses and resources are often custom made for their specific use case.
2. A second type of APIs are the ones created by data owners or the transport companies themselves. They are set up in order to stimulate reuse for use cases they have in mind. The problem with these APIs is that often there are rate limits; it is hard to get through the user agreements; they have awkward SOAP/ XML constructions; and they don't follow existing specifications such as SIRI or GTFS-realtime.”
3. A third type of APIs are the ones created by a consortium of reusers and the data owner. The API comes to exist after different people with different use cases are putting forward some resources they need, when they also add how the response should look like, and maybe help build these APIs on top of Open Data.
Should cities provide public transport APIs?
Given the complexity of aggregating open data from multiple transit service providers and adding real-time data on top of that, public transport open data advocates like Colpaert argue that city authorities should avoid providing service-oriented APIs and should focus instead on making public transport data available in open, accessible formats.
“APIs are, in most cases, very bad at publishing data and for the matter of publishing the data itself,” Colpaert says. “They are not worth the cost. APIs are, however, great at delivering extra services on top of the data, such as providing route planning for a certain region or system. If you want a route-planning API, I strongly suggest reusing an open source project [see the examples listed under the API type 2 listed above] or hiring the people behind these projects for the most low-cost option.”
Stefan de Konink from Stichting OpenGeo in The Netherlands agrees:
First of all, governments should not provide service APIs, they should provide data,” he says. “In the Netherlands, we have finally solved that using a national data for public transport portal. Hence, you get the raw data from the operator via an independent third party, not an HTML page from an API.”
Emer Coleman, CEO at DSRPTN and director of business development at the U.K.’s Transport API, and her colleague, David Mountain, director of products and services, are emphatic that governments should let private providers manage the service elements of a public transport data Feed.
“Best practice is for cities to release as much transport-related data as possible and allow third parties to add value,” says Mountain. “This includes the bulk downloads (for example, timetables) and also live data. For us, it would also be great to get lower-level access to tube and bus data in the U.K., such as the real-time position of vehicles from GPS.”
Data duplication issues for public transport API providers
For cities wanting to open up their public transport data, it can be confusing to map what is most useful for end users. Some cities release flat files, such as a CSV format of timetables or bus stop locations.
Others, like Seattle, which has a Socrata-powered open data portal, are focusing on providing GTFS feeds rather than Transport APIs. GTFS, of General Transit Feed Specification, is a data format that enables public transport data to be added to Google’s Map service so that directions can be provided in Google Maps products and with services that use the Google Maps API.
In an email reply to ProgrammableWeb from the Google Transit Feed Service team, Rahul wrote:
Transit on Google Maps is a public transportation planning tool that combines the latest agency data with the power of Google Maps. For agencies around the world, Google Maps is a cost-effective solution, and they use GTFS specification to provide schedules and geographic information on Maps and other Google applications. GTFS can be used to power trip planners, timetable publishers, and a variety of applications that use public transit information in some way. The Google Maps APIs give developers several ways of embedding Google Maps into web pages and allows for either simple use or extensive customization.
Colpaert suggests that for cities just getting into the open data game, providing GTFS is an easy start. “For publishing data as open data, good practice is to publish a GTFS dump first, which should be a directly downloadable ZIP file. Plenty of examples can be found at the GTFS Data Exchange and at the Datahub,” he says.
“They are often extended with a service for real-time delay data such as a GTFS-realtime interface or a very lightweight API without restrictions.”
De Konink agrees: “For a service provider, it typically takes you a day or three to set up and, depending on the data quality available, about a day per week to manage. To use GTFS in a planner, you have to convert it to a binary BLOB that is suitable for planning. That takes less than an hour.”
Resource implications for cities providing public transport APIs
The resource implications of managing API feeds can become overwhelming for larger cities. The Metropolitan Transit Authority of New York (MTA) provides the MTA API for access to New York’s subway and bus GTFS data feeds. A new policy will require developers to register for an API Key before being granted access to the API data.
“Since the MTA began publishing its open data in 2008, over 200 mobile apps using MTA data have been launched by the developer community,” Aaron Donovan wrote on the MTA API Developers Google Group. “We’re glad to be able to support this development, and we continue to seek new, valuable data releases that will benefit our customers.
“That said, far too many third-party apps have neglected to download and host the data on their own servers. This not only violates the long-published terms and conditions for use of MTA data, but places unnecessary strain on MTA servers at additional cost to the agency,” he wrote.
Other government transport API providers around the world are introducing similar measures. The state of Queensland, in Australia, has responsibility for managing public transport services for its cities, including Brisbane. Its Department of Transport and Main Roads’ TransLink division has released the OPIA API, which strongly recommends that developers build their own server-side service that wraps the TransLink API and to use caching of all data except the data needed for immediate journey planning.
Collating public transport from multiple sources into one API: The TransportAPI example
The resource constraints for a city authority managing a public transport API are not limited to the server costs associated with third-party apps constantly Polling an API directly from the end user’s device, explains Coleman of data aggregator and API service provider Transport API.
Public transport data can be used by businesses dominated by a sense of place: venues, local news media and service networks. But to use APIs as part of their hyperlocal services, they need the reliability of entering into service-level agreements (SLAs) so they are confident they will have access to the timely public transport data they need, around the clock.
“In government, the service element is completing forgotten, and that’s where data aggregators come in,” says Coleman. “Who is going to be there at 11 p.m. if the API falls over? It is a risk to the business model if you can’t provide SLAs. Even the city of London doesn't provide SLAs for use of their data and for anyone developing on these APIs.
“That’s why it is better for everybody if it comes via a data aggregator that can provide those SLAs,” she says.
Echoing the position of the Open Knowledge Foundation’s Colpaert, Coleman — who led London’s open data strategy and architected the London Datastore — believes city governments should stick to publishing data and not managing the service API aspects:
Governments are good at collecting data and have not been historically good at using that data or visualizing it to give it meaning. So the User Interface has not been factored in to a government’s resourcing of data collection and repurposing.
Transport for London are very clear that they didn't foresee the added value that the Transport API is bringing to them now. That has included how public transport is sustaining small and medium-sized enterprises and adding to their customer base. One of the early fears for them was that a bad app would reflect poorly on them. But people associate the good apps with them, and just dump the bad ones.
Transport API was founded four years ago by Jonathan Raper, Mountain and Coleman. Raper and Mountain had been working in a city university and left to work in private business, while Coleman moved across from her work on digital projects at the city of London.
“Traditionally, the transport sector is not very open and, initially, there was a lot of resistance,” Coleman says. “Our argument was that opening this data is making it easy for people to get around. We were bootstrapped in our early years and also offered consultancy services. We received our first round of funding about a year ago, and now we are going into our second round.”
The Transport API has about 600 developers, and while initial growth focused on smartphone apps, it is seeing greater uptake among hyperlocal applications offered by larger enterprises.
Becoming a data aggregator is one approach to commercialization that may suit some developers. For Transport API, the opportunities to service new markets is continually growing. Already, its commercial API license model has drawn in customers like ScreachTV and Toothpick (see the hyperlocal examples below). It is also servicing large government authorities and global franchises operating in the U.K.
“Heathrow airport are one of our clients, and our API data will be populating their kiosk screens and websites. Ikea have also signed up to use our API, so they can provide store information,” Coleman says.
Resources involved in being a public transport API data aggregator
Mountain of Transport API describes the data architecture required to become an aggregate service provider:
Mostly we aggregate on our own servers for bus timetables and train timetables, plus live movements. For a minority of features, it isn't possible to aggregate the data locally. For example, there isn't a source of data we can use to track all buses, so we can't update these movements in our own databases.
Downloading data from the complex web of providers and services across London “varies massively for different modes and features,” says Mountain. “For train timetables, we update the timetables daily from a bulk download, but also have a permanent connection to a live feed to receive last-minute timetabling changes. We also use a live feed to receive and store live train movements that come in at subsecond frequency. For buses, a semiautomated weekly update is sufficient.”
Mountain describes the data management infrastructure Transport API uses to manage its service:
We use a cloud service provider and load balance over four servers. This avoids any outage if a single server or database instance fails, and [offers] the benefit of being able to take a server out of the loop and prod it if it appears to be misbehaving. We mostly use Postgres as the database, but have started using MongoDB for the new services we are developing.
Other data aggregator models: Electric Labs’ private-partnership approach
Also mirroring type three in Colpaert’s categorization of public transport APIs is Electric Labs. It has worked with Transport for London (TfL), specifically on turning the bus transport data into open data and API feeds.
“We began with a more fully encompassing project for TfL in 2010, developing their bus information system in London (this was for a former company that we worked for),” Will Ryan, founder at Electric Labs, told ProgrammableWeb.
This meant we were doing both the data management and consumer-facing proposition, and the APIs were designed and developed alongside one another. This was quite important, since being a consumer of our own API allowed us to really think about how developers might like to use it.
Other companies — such as Siemens AG — developed the upstream raw data such as GPS positioning of the buses, but it was our job to implement consumer-based products for this information. Initially, this was a website, mobile website and SMS [text messaging] service. The desktop site used AJAX for much of its functionality, so when the user wanted to update a bus departure board, for example, there was a RESTful JSON Endpoint for this data. It didn't take long before indie developers were scraping these feeds to develop their own products, even before the public API was announced.
Working on the data management side certainly helped us get to grips with presenting public transport information as it can be quite complicated, so understanding the underlying formats can be very useful. It also gave us confidence to enter new markets with data formats that we were very familiar with.
A data-scraper aggregation model: TransiCast
Ryan is reluctant to encourage other developers to also pursue a data aggregator model, at least of the type three model, where the API is created in partnership with a city authority: “Forming a relationship with a big transport organization like the MTA or TfL can be tricky and time-consuming,” he cautions. “That being said, writing an API on top of existing APIs, perhaps by augmenting additional information, is definitely a viable business model.”
This is the approach TransiCast has taken, reflecting more of the type one categorization as outlined by Colpaert above. “We have been working on curating public transportation data access on GoogleTransitDataFeed since 2006, with no association with Google,” says Joachim Pfeiffer, founder and principal at TransiCast.
We specialize in aggregating transit data across North America. In a nutshell, developers get a single TransiCast API format to code against, a dedicated database instance and developer API key management. The latter two have been a point of concern that MTA is trying to address right now, and which we readily solve for our subscribers. Out of the 300 feeds we currently cover, about 20 require API keys, and we keep those as issued by the agency to developers, to place calls to the agency web services. This already includes MTA Bus and MTA Subway.
Pfeiffer goes into considerable detail to share how he has built a data aggregator model using only open data and city-run APIs:
TransiCast consumes static data (in GTFS), and real-time data. This Builds on the GTFS feeds made available by transit agencies and the various real-time APIs that transit agencies have implemented.
Static data in GTFS is always offered in bulk, providing service network and schedule data. Some agencies have set up mailing lists to let developers know when updates to their GTFS data are available for download. Such updates are rolled out in the TransiCast API straight away. Otherwise, we go around and download GTFS feeds on a regular basis, analyze the data for any changes that may have occurred and update accordingly.
For real-time and dynamic data, transit agencies tend to roll out two basic API models that I characterize as:
Pull-deliver model: The pull-deliver model is usually a traditional REST API that offers queries for next bus and train times at selected stops (and variations thereof). This is often provided in JSON or schema-less XML. The TransiCast API directly exposes this model to the callers but flattens out the various formats into a single one. Where needed, real-time data is complemented with static GTFS. As an example, some agencies only offer real-time data for a subset of their routes, so TransiCast augments next bus and train times with scheduled times derived from the static data. This is transparent to our subscribers; all they have to look for is a flag that indicates whether the next bus and train times are real time or schedule based.
Bulk/pull-cache-deliver model: In this case, a snapshot of the full real-time service database is provided to the caller. This requires a Back-end server on the caller's end that caches the data and polls the agency's server every so often, for example, every 30 seconds. TransiCast supports this model as well. In this scenario, a subscriber call to the TransiCast API first goes to the TransiCast cache, and only if an update is needed, a call goes back out to the agency servers.
Pointing to the way this model avoids direct polling of city authority’s API, Pfeiffer notes, “This has been a matter of contention for MTA (and other agencies before).
Pfeiffer shares some insights into the data management infrastructure he uses:
TransiCast uses Google App Engine and the GAE datastore to keep the static data. The caching of real-time data (where needed) is not persistent and held in an active servlet.
We also use a proprietary API management system. Most significantly, each subscriber has an exclusive and dedicated instance of TransiCast. This way, performance (and associated cost) can be tuned to subscriber targets, and it is easy to keep agency API keys separate, as these API keys are obtained and owned by the subscribers, not TransiCast.
TransiCast also does not aim or claim to own the transit data. From an agency's perspective, TransiCast instances only keep optimized copies of the data and provide back-end server caching to meet the requirements of agencies, such as the MTA.
GAE scales pretty transparently, and I cannot speak much to agencies architecting their back-end infrastructure. But let me point to TriMet in Portland and Transport for London. The first have been at the forefront of implementing GTFS and real-time feeds and have a host of experience, and Transport for London has pioneered the bulk delivery of real-time data as well as targeted real-time APIs at large scale. They host on Microsoft Azure.
How developers are creating products using public transport APIs: Transport and mobility-related products
1. Electric Labs: Bus NYC
“The initial release of Bus NYC was developed over a three-month period while I was spending time in New York,” Ryan says. “I had just started Electric Labs, and I was looking for a flagship product to showcase the work that we could do and it coincided with the release of public transport data by the MTA. I worked really hard over that period as we didn't really have any other means of income at that point and wanted to get it out the door as soon as possible. There was very little competition in the market during those early days. At present, we might spend a few days a month adding new features and maintaining the app. It all depends on how busy we are with other client work, as well as how profitable the app has been over that month period. The more downloads that we get, the more we invest into the app.”
Swedish startup Traverse has taken a private-public partnership model approach to its business. It funnels public transport data from private operators into its API model and adds a ticketing service. It is then able to provide a white-label API back to each operator to enable its proprietary websites and applications to offer both transport route planning and ticketing in the one interface. Traverse developer Torsten Freyhall recently attended a Nordic APIs event, in part to start speaking with other industry sectors about opportunities to embed the ticketing and travel planner into their hyperlocal products.
3. OMG Transit
OMG Transit started out as a competing team in last year’s National Day of Civic Hacking. It has been able to draw in open data on public transport and other transit information to create a route planning app. In the past year, it has won several awards, including a national Innovation Award; was accepted into the Intel Innovation Pipeline program; and received a commendation at the White House Champions of Change event.
Navitia is an emerging open source API Platform for public transport data. Developers can’t yet register for an account but can access the API. Contributors are being invited to upload transport data to the platform for their cities, and six cities are already available. The API provides a common transit glossary so that developers can scale applications for cities using the same resource calls.
How developers are creating products using public transport APIs: Hyperlocal products
One of the most exciting frontiers of using public transport data via API moves beyond route planning for its own sake and instead focuses on its contextual relationship in a hyperlocal environment. Services are adding real-time and public transport data to their interfaces in order to extend the value chain being offered to end users.
“The venue market is exploding for us,” says Paul Rawlings, CEO and founder of ScreachTV. Screach offers a content-channel service to any business with access to a screen: venues, information kiosks or app developers. It finds that venues are a key target market for public transport data feeds.
“The reason we target pubs in the U.K. is that when there is no football or soccer, we provide the ability for a venue to create their own TV channel. And we use data like public transport data to drive the end customer’s behavior,” says Rawlings.
He explains the value of public transport data in the hyperlocal environment:
Most venues know that, come five minutes to closing time, everyone is out the door. So with the Transport API, rather than the passenger waiting at the bus stop, we can effectively keep them in the venue. It is quite a big selling point for us. We can flip to it at any point, for example, we can say after 10 p.m., we can increase the frequency. Or because we know it is raining outside, we will increase the frequency of the live transport timetables, so people can wait inside the venue rather than in the wet if a bus or train is delayed.
While Screach offers a number of content streams — similar to the way cable customers can select which channels they subscribe to — public transport data is becoming one of the essential offerings, he says:
Public transport data is one of the feeds we offer, and it is a big part of the natural life cycle of a customer, so it is an easy sell to a venue. In venues, we can even show an advert around the feed, so, for example, we can show the transport timetables alongside the kitchen’s menu and say, ‘We can serve your food within 10 minutes.’
Using a public transport API for its hyperlocal service has been a fairly straightforward process, mostly because Screach uses the Transport API’s data aggregator service rather than trying to manage the data direct from city government and transport provider services. “Using their API, we configured it within about 20 seconds,” Rawlings says. “It’s very easy to do. Once we have integrated with it, we don’t touch the code base.”
A nominee in the Europa startup awards, Toothpick is a service directory and appointment booking marketplace for dentists across the U.K. It partnered with one of the largest dentist practice management software companies in order to integrate the service directly into the dentists’ online booking calendars. Lotta Holmberg, in charge of Toothpick’s marketing communications, told ProgrammableWeb that initial feasibility research proved the value of providing hyperlocal, contextual information to potential end users:
The founders first met over an emergency dentist appointment booking proposition for London. Research followed, including a survey in which 450 out of 1,000 dental practices failed to pick up the phone during standard practice opening hours and only 40 percent of them were found to have a website. Toothpick CEO and co-founder Sandeep Senghera (then a practicing dentist) knew there was a serious problem in that consumers couldn’t shop around and access up-to-date information. Today, over 50 percent of appointments on Toothpick are booked outside of practice opening hours.
We know that dentistry in the mainstay is hyperlocal, we also know that many patients don’t visit the dentist as often as they should. At Toothpick, mapping location and transport information are key to being able to make a clear and informed choice. Combined with live appointment availability, rich bios, treatment information and reviews, … Toothpick patients find the treatment and appointment that’s best suited for them.
Toothpick’s business model is based on a subscription and appointment commission-based fees model. The fees approach is in part driven by the complexity of the dental health system’s arrangements in the U.K. But it is proving a viable model and targets the dental service providers rather than the end users looking for the local information.
The importance of policy and culture in opening up and using of data
For those working in or with government providers of data, it is often remarked that the technology issues are among the easier challenges to solve. Much more difficult are the political and cultural factors that may cause city authorities to be reluctant about opening data, or when they do so, to poorly manage the release so that the data is next to useless.
Given the nascency of the sector, using city authority public transport APIs is still fraught with complexities. Ryan from Electric Labs points to some of the problems it has in using the MTA API as an example:
The main problem with the current real-time API is that there are no arrival predictions, just distances. This is not particularly useful for the traveling public, as users are unable to make sensible decisions based on distances; they'd prefer to be told how many minutes away the next bus is. We use the scheduled data mixed with the real-time distance data to come up with a decent time estimate for our users, but this is definitely an area for improvement. In London, the system is very advanced, and we use traffic data, historical data, weather conditions, etc., to produce an accurate time-based estimate for the consumers of the API, and this is definitely an area for improvement in New York City.
Ryan also points to the multiple challenges facing developers using more than one data source, who must then struggle with managing the different ways public transport APIs are coded. This is an issue that CitySDK is trying to help cities solve. Its Mobility API is a developer-facing API design-model that it hopes will become a standard across European cities.
“I’m happy with any attempt to standardize data interchange formats as this will result in better, more far-reaching tech solutions for the consumer in the long run,” says Ryan. “In the past, we worked at a company that developed the SIRI standard, which was an attempt to standardize how different transport operators shared their real-time information. It's a format used by both the MTA and TfL for their bus systems and has been adopted throughout Europe.”
To show just how much it doesn’t get developer engagement, the NRE held a developer consultation workshop in May 2013 to respond to developers’ needs. In November, it summed up the outcomes of the consultation, basically saying no to every developer request made, including not opening up the APIs used internally; not being able to offer JSON and XML feed formats, as requested; and holding off on a push notifications functionality because it would cost £20,000 to implement. NRE did respond positively to promoting third-party apps on its website, stating, “We already show this information.”
Where to next: Leveraging the opportunities of public transport APIs
A number of new initiatives aim to help both cities and developers supply and consume public transport data via API. Initiatives like Navitia and CitySDK are hoping to create a standardized approach to writing transport API code that makes it easy for developers to scale public transport apps in cities across Europe. The hope is that if cities all conform to the same naming conventions, developers can quickly develop multicity applications such as travel guides and transport planners.
Cities, data aggregators and third-party developers are also encouraged to share their API code in a Creative Commons format on API Commons. This platform aims to help speed up scalability by letting API providers share their API code so that other cities and transport providers can replicate the formats, hopefully aiding standardization efforts and reducing duplication and city costs when commencing their own API and open data strategy around public transport.
At Transport API, Coleman and her team are also experimenting with a new wave of contextual products, with the hope of being able to commercialize public transport data even further. Some of the products may be sold back to transport operators and city authorities keen to ensure citizen satisfaction with their services.
TransportBuzz overlays tweets about public transport with its location in order to create a real-time sentiment analysis of a city’s transit. “Public transport is of interest to hyperlocal media,” says Coleman. “So geolocated tweets can look at specific areas. It’s of interest to journalists: Transport is quite a key issue for everybody.
“We see a lot of complaints about drivers or behaviors that people don’t like on public transport (for example, eating food on buses or trains generates a lot of comments). This data could also let public transport companies look at customer satisfaction from a broader base and really listen to concerns of their commuters and let them address communications. TransportBuzz picks up that softer intelligence,” Coleman says, reflecting on potential commercial customers.
TransportBuzz needs to use sophisticated algorithms to sort the sentiment analysis, in part because some keywords have multiple meanings. They have to filter for all tweets that talk about people attending training, rather than trains, for example. “Also, people don’t always mention the specific operator, so unless you are specifically mining for that geolocation, you can miss the hyperlocal data that is useful for transport operators to know about.”
Transport API is also working on Tube Radar, a mapping service built on Transport API that shows overall reliability of the London tube network, with a traffic light coding system to show if trains are on time (green), have a little waiting time (amber) or have stopped (red). (While scoping feasible commercial opportunities, the site includes an Internet banner ad to generate some revenue.)
The message is clear: For developers entering the market, once they have gotten used to managing public transport data via API, there are countless opportunities to create products that may have commercial application.
Cities looking to open up their public transport data would do well to review the models of Transport API, the Open Knowledge working group; Navitia; TransiCast; TriMet in Portland, Ore.; and MTA New York City to review successful, sustainable models of managing public transport data.
For developers looking to leverage public transport data for their commercial products, it may be best to start developing apps, products and services in cities where there is an overarching policy in place that commits the local government to opening up its data sources. Developers should look for signs that this policy is more than just rhetoric — for example, recent replies in developer forums and signs that new data sets are continually being added.
Cities that rank well on open data indexes may also be more fertile grounds for building prototype apps.
Alternatively, data can be scraped from websites, and mixed with data from GTFS (there may also be opportunities to replicate TransportBuzz and analyze tweet sentiments for mentions of delayed trips). As TransiCast has proved, this could be a potential business model for population-dense cities where businesses and citizens want this data but there is no official government access.
For developers getting started, being clear about the business model and identifying the data supply issues are crucial planning activities. Can developers re-create the data aggregation business model, or is there a consistent supply in the local area that would allow consumer-facing apps and visualizations to be produced from available API supply? If the business model requires a certain level of scalability, how easily would it be to replicate the service in another city?
Developers also have a number of upcoming opportunities to test the use and availability of public transport APIs and to meet with city authorities to gauge the level of support for open data policies. Many cities are actively participating in the National Day of Civic Hacking at the end of the month, and many hold their own hackathons throughout the year.
Startups and other event organizers also hold related events. UP Singapore is hosting a geospatial hackathon in conjunction with the Singapore Land Authority and OneMap Singapore from June 6-8. June 13-15, StartupBootcamp Berlin will host a hackathon focused on smart transportation and energy.
Cities and data aggregators with a public transport API are encouraged to submit details to the ProgrammableWeb API directory.
Are you working with city APIs? Contact Mark Boyd to discuss how you provide or consume city-centric APIs for future articles in our Smart Cities and APIs series.