PW Interview: Greg Lamp and Austin Ogilvie of Yhat on Shipping Predictive Models via API

Ajay Ohri
Apr. 22 2013, 12:00PM EDT

The ŷhat Cloud Toolbox is a suite of development tools for integrating predictive analytics into existing Web and mobile applications. ŷhat deploys your models as RESTful APIs, so there’s no need to port, translate, or adapt models for integration with existing systems. We have earlier covered YhatHQ API here.

Here is a broad interview with the founders of Yhat, Greg Lamp and Austin Ogilvie. They talk about their startup journey, the improvement in Python's data science capabilities, broadening their beta access program, and the Yhat vision for hosted machine learning.

Ajay- Describe your experience in statistical computing from your education to the founding of Yhat.

Yhat- Greg got his start in computing in college where he majored in Systems Engineering and Math. He began to explore scientific computing specifically while at comScore. Austin did General Assembly's intensive data science course with Mike Selik ) (data scientist at Infochimps and (Ryan Witt) , CTO at Incrowd Ads .

Prior to founding Yhat, we worked together at (On Deck), a fintech startup in NYC, where we built the first-ever self-serve online small business loan.

Ajay- What is your vision for Yhat. How do you aim to differentiate from other players in this field? How does the Yhat API compare to Google Prediction API and /or BigML API?

Yhat- Integrating predictive models into production software is notoriously complex. Most of the time it's done via long and costly throw-it-over-the-wall processes which involve porting code from one language to another. It's usually complex, error-prone, poorly-documented, and time-consuming.

Yhat answers these issues by allowing analysts to deploy models as-is without adapting code at all. Developers and analysts use a common API to deploy, manage, and consume models, so engineers and analysts get to speak the same language. We want Yhat to become the standard for embedding predictive models in production software.

We don't plan to play in the "drag and drop data science" arena at all. Predictive analytics is an inherently exploratory, and we're skeptical of products that promise point-n-click machine learning.

The Google Prediction API and BigML are focused on machine learning as a black box service--just send us your data and we'll figure out the answer.
Yhat is a tool for experienced analysts/data scientists to ship predictive models rapidly.

Ajay- You work with both Python and R. Please compare and contrast the strengths and weakness in these two languages.

Yhat-First of all we love Python and R. While we currently only support Python, we'll be rolling out an R package in the next few weeks.

R is great because it's main purpose is for analysis. This gives it a lot of features that might seem strange to people more familiar with general purpose programming languages like Ruby or Python. The variety of stats, machine learning, and visualization packages is unbeatable. One of the biggest complains about R is the syntax. It takes some getting used to but once you understand the basics it begins to all come together--https://github.com/tdsmith/aRrgh.

Python is a great language in its own, but has really come a long way as a scientific language in the past few years. For us it was the advent of pandas that really made data analysis in Python practical and fun. We find that Python is much better for data cleaning and munging--especially nested data types. From an analytics perspective has fewer libraries than R, but each library is packed with features and algorithms--scikit-learn for example. Having fewer libraries with more consistent APIs is huge because it allows you to do more while keeping your data in the same format.

Ajay- What is the culture you want to incorporate in Yhat for future growth?

Yhat- We're big on sharing, community, collaboration and other values that define the hacker ethos. The scientific computing is still a prickly community which makes it tough for people trying to learn new things and pick up skills.

We were at "Startup Row" at PyCon and also attended PyData in March which was a lot of fun. In the short-term, we plan to expand our open source presence and our blog (http://blog.yhathq.com/) which features practical examples and tutorials on a host of topics using using R and/or Python.

Ajay- Describe the Yhat REST API. What are the ways you are trying to convince developers to use and create mashups for your API? What are some of the developer tools for enhancing API usage.

Yhat- The core Yhat API let's you take a model built in Python and deploy it to a cloud-hosted server where it's immediately made available to your other applications. It makes use of the code and models you've already written and typically involves adding 5-10 extra lines of code to your scripts.

We also offer a dedicated option for deploying to your own servers.

A few hundred people are participating in the beta and we're working closely with a handful of power-users. Most people found us through our blog. We just released a node-js client for the Yhat API (https://github.com/yhat/yhat-node). Install with `npm install yhat`. We've launched an admin dashboard and reporting API for a few users which we'll be rolling out to all beta users shortly.

This is one exciting API startup in data science, and we wish them luck in their journey. To test their APIs in beta access , you can request an invite from YhatHQ.

Faster, more powerful data science models? Just another (YHat) API call away!

Ajay Ohri is the author of R for Business Analytics and likes to write on Enterprise ,Cloud and Statistical APIs with an emphasis on interviews. Follow Ajay on Google+ and connect on LinkedIn

Comments