Will the API Kill the Data Scientist?

Since PipelineDB released the developer preview of its realtime analytics API, Stride, one is tempted to ask whether APIs in connection with machine learning may replace the growing role of human data scientists in the data analytics space. To further explore this concept, ProgrammableWeb caught up with PipelineDB's President and Co-Founder, Jeff Ferguson. Ferguson and his team believe that machine learning can easily replace the role humans play in a number of data analytics tasks, and this shift has already begun to take place.

"Machines can learn all sorts of things from streaming and siloed datasets, all with orders of magnitude more accuracy, speed, and scale than human beings," Ferguson told ProgrammableWeb. "A simple example is realtime a/b testing in advertising, where self-healing software systems can create, test, and iterate on multiple versions of ad copy, continuously weighting campaign spend toward ads with the highest click-through rates."

In Ferguson's example above, a machine can continue to reiterate this process to perpetually improve ad performance and optimize campaign spend. The Stride team at PipelineDB calls this type of learning "machine analytics" and Ferguson envisions this as the next big shift in data analytics. Not only can machines automate the continuous improvement described, machines can perform such tasks in a matter of minutes, or sometimes seconds. No human data scientist could ever achieve this level of efficiency.

In a true, machine analytics-driven environment, human intervention remains minimal. Referring back to the advertising example, the machine both uncovers the most effective ads/spend, and takes subsequent action to deploy the most effective strategy. Where traditional data analytics reports data back to a human for analysis, machine analytics gathers data, analyzes the data, and acts accordingly. Until recent developments, the major emphasis of leading data analytics tools has been clean, effective user interfaces. Machine analytics removes the need for a nice UI to guide a human data scientist. Ferguson continued:

"The ideal output of machine analytics is not better dashboards, it's more money in your bank account, increased crop production, and airplanes taking themselves into a hangar for maintenance after safely completing a flight because the software detected the need for a replacement part."

Although the concept of machine analytics has been proposed, and companies like PipelineDB have taken the first steps towards this reality with Stride, fully automated machine analytics has not yet been delivered. Instead, user friendly dashboards are currently key to effective data analytics. Whether it's recommendation engines used by Pinterest, or Uber's dynamic pricing; realtime reporting back to humans is key to quality analytics in today's world. Accordingly, for the immediate future, human data scientists will continue to play a hands-on role in data analysis and subsequent decision making. Ferguson expanded:

"Until we build software systems that are sophisticated enough to autonomously run the world and further improve themselves, human beings will have a major role in designing building, and managing these systems. For the foreseeable future data scientists, analysts, and other analytics-focused roles aren't going anywhere but their role will be to leverage machine intelligence to essentially replace their own jobs."

Once machines achieve the level of self-sufficiency Ferguson describes, the API will serve as the ultimate marketer and democratizer of such technology. Machine analytics as described by Ferguson will require a massive amount of compute power. Very few companies will have the working capital to stand up their own machine analytics systems. However, API access to such systems will give subscription-based access to such services. TensorFlow from Google is a great example of this potential. Accordingly, once the technology is built, the data scientist may be an API away from irrelevance. When that day comes, Ferguson believes the role of humans in data science will be elevated to executive level decision makers who have very little day to day interaction with the data. Rather, machine analytics will get broad strategic guidance from executives the same way sales, business development, and marketing strategies are guided by executive teams today. Keep an eye out for the continued progression of machine analytics. PipelineDB expects to release Stride in developer preview in a matter of weeks. 

Eric Carter Eric the founder of Dartsand and Corporate Counsel for a specialty technology distributor. He is a frequent contributor to technology media outlets and also serves as primary legal counsel for multiple startups in the Real Estate, Virtual Assistant, and Software Development Industries. Follow me on Google+