From the DC-Area API Meetup: How an Alternative Data API Can Be Used To Improve Predictive Analysis

As a part of ProgrammableWeb's ongoing series of on-demand re-broadcasts of presentations that were given at the monthly Washington, DC-Area API meetup (anyone can attend), this article offers a recording and full transcript of the discussion given by Accrue Ltd. founder and CEO Benoît Brookens who is based in Hong Kong. Originally, Brookens was a securities trader who started to wonder whether seemingly unrelated events could be correlated to the change in stock market prices. He then began to plug the details of those events into a calendar in a way that he could look at the sudden rise of a stock and correlate that rise to the other events that happened on the same day (or the days just preceding).

The result of that exploration is his company Accrue and the API it offers to anyone wanting to do the same types of correlations; for example investors or analysts.

The DC-Area API Meetup almost always takes place on the first Tuesday of every month. The attendees consist of API enthusiasts and practitioners from all around the federal government as well as businesses and organizations that are local to the DC Metro area. There is no charge to attend and attendees get free pizza and beer, compliments of the sponsors. The meetup is always looking for great speakers and sustaining sponsors. If you're interested in either opportunity, please contact David Berlind at If you're interested in attending, just visit the the meetup page and RSVP one of the upcoming meetups. It's that simple. 

Here's the video of Brookens' talk and the full transcript:

Developers Rock Podcast (special edition): How an Alternative Data API Can Be Used To Improve Predictive Analysis

Editor's Note: This and other original video content (interviews, demos, etc.) from ProgrammableWeb can also be found on ProgrammableWeb's YouTube Channel.

Audio-Only Version

Editor's note: ProgrammableWeb has started a podcast called ProgrammableWeb's Developers Rock Podcast. To subscribe to the podcast with an iPhone, go to ProgrammableWeb's iTunes channel. To subscribe via Google Play Music, go to our Google Play Music channel. Or point your podcatcher to our SoundCloud RSS feed or tune into our station on SoundCloud.

Tune into the ProgrammableWeb Radio Podcast on Google Play Music  Tune into the ProgrammableWeb Radio Podcast on Apple iTunes  Tune into the ProgrammableWeb Radio Podcast on SoundCloud

Full Transcript of: How Alternative Data Can Be Used To Improve Predictive Analysis

The following transcript is from Benoît Brooken III's presentation, transcribed as best as possible from the video above. As with many transcriptions of this nature, some sentences may run on, or may appear fractured. Our goal is for the transcript to be as true to the presentation as possible.

Benoît Brooken III: Just a little quick intro. My name is Benoît Brookens. I'm the founder of a big analytics company based mostly in Hong Kong. I'm here visiting, I'm a Washingtonian natively.

We are essentially an alternative data company. Alternative data is essentially this intersection of liberal arts and technology, like Steve Jobs talked about. It's basically the acceptance that every industry could benefit from some other contexts that their industry might not be currently appreciating. For instance, it's like a farmer using social media data to understand the impact of avocado prices, or looking at shipping information in order to anticipate demand or supplies of competing products in the market from a foreign place. I know that's a little high level, but in a nutshell, we're providing event-based intelligence for decision makers to make better decisions.

In 2017, we were selected as the most promising Fintech company in the world by the London Stock Exchange and the UK government. Here's us opening the UK Stock Exchange and I'm in the middle.

Anyway, the focus of this topic is demonstrating one use case that we focus [on in] what we're building. I'm not going to spoil it by telling you actually what we're building yet, but financial investors have a problem because they are now inundated by alternative events that previously never drove impacts to the level that they do now. For example, a Trump tweet is sending markets up and down, disrupting Boeing, disrupting all types of things, trade war developments happening. As they happen, they force people to put on many different new hats to assess what the impact is on their businesses, and maybe potentially their investments.

In this case, talking about APIs, I'm looking at the financial market as a simple time series dataset. It's something measured over time, like end of day sales, innovate prices of any variety of volumes. Essentially one use case is basically in Hong Kong. Last September I encountered my very first typhoon. It was a typhoon 10, in fact. This is a category five, like the one that had recently devastated The Bahamas. Buildings are swaying three feet back and forth, and I was in a panic wondering, "what should I do? Should I have hopped on a flight and gone to Thailand or somewhere where it was calmer and nicer in order to spend my day?" But I hung back. It was my first typhoon, and I rushed to the grocery store like everyone else. Rushing to the grocery store, the grocery store shelves were empty. I thought that was interesting because it happens every time.

What I did was sit at home and, having technology like the way I have, I was wondering how I can turn the observations that I was making into some insight that might or might not be reflected as a hypothesis in the financial markets? Essentially what we did, is I back tested it. I took every single typhoon 8, or more, over the past five years and I started correlating it to particular stocks. I was looking for good risk adjusted returns, meaning that these stocks typically exhibited good risk reward ratios in the financial markets.

What I uncovered was two big brands. One was a non-alcoholic beverage company and two was a brewery company. These are the second biggest in their class, meaning this is the second biggest, non-alcoholic beverage brand. They sell juice, tea, water, coffee, et cetera, and this is the second biggest beer company in China. I'm in Hong Kong by the way, so 100% of the time over the past five years, these two stocks are reacting. I found that really interesting.

This is not investment advice, I'm not telling you to go buy any stocks. This is not an investment show, but essentially what I started doing in my thinking process was that I was for the first time taking something called unstructured data. Unstructured data is growing at about 12.8 terabytes every minute and these things are coming out of a variety of things, from cameras to sensors to government websites, and they're all actually not in a condition or format to do any type of analysis. Meaning, if you wanted to understand the impact of... Say you own a sandwich shop and you also have a gelato in the back, and you want to know if you sell more sandwiches or gelatos on a rainy day.

How many businesses could actually do that? Not many. In fact there are POS's that have the data for their sales, but there's no API that they can plug into and say, "Hey, it's a rainy day in D.C., what should I do? Should I go make more sandwiches or should I just keep the gelato cold?"

We are building one of the first APIs in the world that is commercially available for the private sector to begin to do these things far more casually. This is unstructured data, it's growing at a rapid pace. What we're doing essentially is turning the world into many different types of calendars. These are religious calendars, Blockchain industry calendars, seasonal calendars, sports calendars, weather calendars, natural disasters, political products, et cetera. At product calendar, would be like an iPhone release. At a corporate calendar level. It would be a CEO speech, WWDC conference, et cetera.

You can analyze the impact of this on other things that you might take for granted. It's not just about Apple stock, it can be about transportation, it can be about a smart city demanding, "how much of a traffic jam do we actually have?" If you want to see the model, that concept, before you got to a quantitative metric, you have to start with something, start with an event, start with something that you can test, a hypothesis. We're building a way for someone to take that unstructured data, turn it into something simple that's never been innovated on, which is a calendar. We all have a calendar on our phone. It's pretty much the most neglected app in our phones. It's not really innovative over the past iPhone one. It's schedules, meetings, et cetera, but it's very, very powerful.

We are taking this attributed chronology, meaning when we structure the data from the internet, say religious calendars, we tell you where it comes from. We got it from this website or we got it from this place, or we got this article from this governmental source. We're basically classifying this as a knowledge graph. You can say this is an iPhone product release. We can use a graph database to link that to its competitors. iPhone is a product line competitor of the Galaxy, and that's how it's related to Samsung. This iPhone is also a handheld device, so it's related to other handheld devices, but it's mobile, so it's these kinds of things. We're classifying all of these objects and events and activities into a variety of different things.

Blockchain is not a buzzword here. We are using primarily immutable databases. An immutable database is simply a mechanism for recording something as a version of itself over time. We don't delete anything actually, we don't delete any data. When you have different types of editing that happens at Wikipedia, things change and you might have a deletion, you don't want to start from deletionism. If there's an update of an economic record, we will say "revised" rather than "deleted" and "replaced."

Anyway, we are building an integrated global calendar in simple sense of sports, industry, beliefs, economics, weather, and there are hundreds of thousands of calendars that we build.

To give you an example of why this is important, again, these are all financial examples, not an investment advice, we were looking at Apple stock. Say you wanted to go back to 2010. 2010? What was going on at 2010? Cities don't have memories and neither do markets nor people really that good anymore. You want to highlight this little area. What was going on in the spike? You can't go to Google and say what was going on here. You really can't, you can't just type that. There's no database search for that, there's no research engine. People pay a lot of money to find that out.

We built the database, I was pointing time. You could go 8/31 through 9/8. What was taking place between those ranges? You can see the Samsung Epic 4g, which might've been an epic failure because Epic doesn't exist. Then you had iOS 4.1 announced on 9/1 and you had iOS released on 9/8. These two things were really interesting. Funny enough, there was a really interesting correlation between iOS announcements and moon phases during Steve Jobs' lifetime, that many people did not actually extract. But we were able to find the serendipity of fact that uncovered a really interesting correlation that might go beyond spiriocity into something really interesting.

Again, we're building this as an API. This doesn't have to be stock data, this could be virtually anything. You have a sushi shop and there was a sushi expo in town, and maybe your sushi sales go down. It can be anything really.

Keep going further, we compete with some existing players like Kensho, Bloomberg, Thomson Reuters, the financial sector. We cover others, but what makes us special is that you can bring your own data. We're not locking you into financial data sets or telling you that you can't add in Willy Wonka chocolate factory's event or things that are taking place in your small town. This is a very open-ended API database to allow people to do that base intelligence. I don't want to bore you with this, but this is just a sample of a dashboard. To do this would require sourcing, structuring, cleaning, executing, all this information, but a decision maker can come here and kind of get these insights. This is just one example of what we've done with this. It's a what-you-see-is-what-you-get, an if-then drag and drop algorithm builder.

If there was a tropical cyclone above 8 or more, this actually references an API dataset, that historical dataset. You could say buy 100 shares of the market. This could be used in someone's home, or it can be used in a rather large variety of places, if there's a weather event, if it's above 70 degrees, turn on the air conditioners in the home. We're experimenting with different ways to play with this type of information, being API first. We're having a really creative exercise and open to feedback and ideas as to how people will think about this.

This is just another financial use case. This product was available, not available now, where you kind of took it and put it back on our shelves and allows someone to casually do data sampling. If I were to look at this type of logic between these periods of time, what could I infer? What historical data will be presented? You could run thousands of these algorithms at the same time, technically. Again, we're taking this idea of a calendar and we're taking it to automation, taking it to Big Data analytics. We're not focusing on just black box AI or anything. It's really about transparent explainability of things that I believe, things that I see, things that I feel, and being curious about them and thinking if they have any value.

I can go to a real demo. We don't really focus on government, but we do from the perspective of focusing on smart cities. We have a pure smart city focus, cities that want to basically uncover how does traffic, how does weather, how is it impacting in their city and how are events that they may be aware or not aware of impacted us. Part of my team, I have a background as a trader. Some of our team comes from names like SAP, et cetera, et cetera. I'll give you a demo, quickly, of how it works. Any questions so far?

Speaker 2: Would you show us the API?

Benoît: Huh?

Speaker 2: Would you show us the API?

Benoît: Yeah, I can show you he API. Let's see. Let's see. How do I do it? I hope I'm not talking too fast. I'll show you the gooey version of API. I wish I could see this. Can I slide this?

Essentially, you have a calendar here. Again, this is just a sample. This is just a small data sample, but I'll scroll to the bottom to keep it simple. You have variety of things taking place. On what day was that? On the 29th of September, it was kind of scrolling through and you can see things like the Russian Grand Prix, the Berlin Marathon, a variety of things. Let's just click on the Berlin marathon. Under Berlin Marathon, you can basically see that it one of one listed here. That was scraped. Let's see if I can find something with a lot more history.

This is a demo, I didn't test these examples before I made them. This is the FIA Formula Grand 3 (End). Oh wow, only one of those two, that's how funny. I'll go back in time and find something interesting. I'll go back to October 31st, 2017. I'm scrolling really quickly, but you have variety of types of activities happening. You have unemployment rate being reported in Japan, it's Halloween, obviously. These are candlestick patterns and SCC filings and we clustered them together. For this example, you get a quarterly profit increase of this particular stock ticker, you had things like a crypto products were releasing certain versions, you had a car ramming accident in New York city, unfortunately. Let's see, it keep strolling... Astrological events for max.

I'll show you what you can do with this and how you could use the API. Let me find one good one, for instance. Let's just say the SEMA show, it's an auto show based in Las Vegas and it runs every couple of years. This is just three examples, you can see that this was scraped from a and so what you can do in this, this is user API, you can add in, essentially, a date.

Manually, if you're doing your own research, you want to add in earnings releases a calendar, et cetera. I can put it in today's date. I can just do test and then I can source, I'm just going to do This is not signing into the blockchain in this example, but what you're doing is you use API to store simply a time series data set of date time. You're able to see it in full chronology. This is all the start days of the auto show. Let's run this and we can use the API to ask a question to it. I'm going to put in SEMA auto show and I'm going to look at Ford Motors, trading in New York, but they also trade in London, et cetera, but we're going to use it here. We're not going to be too fancy, and we're just going to do an analytic, where we're doing an analytic, basically purchasing at the closing price on the first date of the event and we're going to sell it at the closing price following the event.

What we're doing is we're exploring this data exhaustively, although there's only three examples of this. You can basically see that we've turned this into plain text. This is a five day trade for the auto show, considering the last occurrences and latest being in 2018, entering zero days before the event. This pattern has an average gain loss, et cetera. What it's doing is allowing you to explore your hypothesis about how some event may or may not correlate based on historical activities.

You can see 2017 it dropped, 2018 it rallied, and you can see what happens in between. Starting one day, going into the future, we currently have the setting on five days, but I can use a slider to explore that. If it's 30 days, here's how it works. One example is you can search, so the database is pretty cool.

I'll show you one thing that people take for granted. One other tool we built, it's called the Almanac. It basically is a screener, so you can essentially search all the different types of events from Saudi oil discoveries to Kentucky Derby, et cetera, and you can do a massive scan using our API. Let's look up July 4th and let's just purchase at the closing price of the first day following July 4th, and we're going to hold for arbitrarily two days just to give an example of the API. Again, this can be any time series data set in a business, and we're just simply going to run that across all U.S. Stocks, so S&P 100, this is my last example, no Hong Kong, no Crypto, no Forex.

What you can do in one second is basically take any real world concept and event and you can basically screen across all of the markets in seconds and get a result of all the securities or variables in your business, or factors or employees, or whatever these time series might be. You get sort by tops and tails, you can see Walmart, it's in there for July 4th, you can see Nike is in there for July 4th, and you can explore these patterns.

Anyway, the idea is that you can turn a calendar into an API you can finally use. We're presenting this as a concept for other types of businesses that might be relevant to government for market surveillance or for other types of reasons. Come and talk to me if you have any questions or ideas. Thank you.

Be sure to read the next DC-Area API Meetup article: From the DC-Area API Meetup: How To Build A Scalable API on AWS in 10 Minutes