San Francisco startup Prior Knowledge recently released in public beta their predictive analytics API called Veritable. They want to enable every developer with big data prediction tools, without becoming a quant genius first. Just upload a dataset, execute analysis, make predictions and watch the sparks fly.
Veritable works by searching through all possible relationships in a dataset, using an advanced Bayesian probability machine-learning algorithm. Developers can join the beta program and use Veritable to predict, explain or group data, even with noisy, sparse or heterogeneous datasets.
There are other companies in the big data / predictive analysis space, including Platfora, Clearstory, BigML and Datameer (among others). Prior Knowledge likes to distinguish itself by saying “At P(K), we’re building a platform that understands the actual causes behind your data. Its power comes from some fancy nonparametric Bayesian modeling under the hood. As a developer, you get all the benefits of that technology without having to implement, tune, or scale it yourself. Basically, we want to give every developer data super-powers,“ says Eric Jonas, President and CEO of Prior Knowledge, having some street-cred, including being 'on-leave' from a PhD from MIT in neurobiology.
The developer ‘infer-structure’ Prior Knowledge provides relies on some pretty fancy inference engines in order to efficiently handle big, messy, real-world data, magically transforming it into valuable actionable insights.
Predict has a wide range of applications. For example, voting or buying behavior can be predicted from everything else that is know about an individual.
Related tells you which other columns are predictively related to a given column of interest.
Similar tells you which rows are most, well, similar to a given row. While this sounds straightforward, Veritable has a couple tricks up its sleeve here. First of all, Veritable knows that rows can be related to each other in different ways, depending on the columns in question. A good example is that individuals might group together one way when we pay attention to demographic variables, and then in a whole different way when we look at geographically-linked columns.