PW Interview: Karthik Ram, rOpenSci, Wrapping all science APIs

Here is an interview with Karthik Ram, who has co-created the rOpenSci project, which helps make REST APIs consumable by the R language. Developers can take note of this - R is one of the widely used statistical languages in the world, and has many GUIs for making advanced data mining easily available.

Ajay -What was the motivation in creating rOpenSci?

Karthik-I've long been frustrated by not having easy access to data that was supposed to be freely available or being able to reproduce findings from a study. This frustration got me started on the open science track. Around the same time I found Carl and Scott talking about several of the same issues on Twitter. We then decided to pool our resources together and start rOpenSci. Our initial goal was to start building bridges to existing data sources so we could inspire not just researchers to be more open about their work, but also to nudge data providers to freeing up more data and accelerate scientific discovery.

Ajay-What have been some of the results and feedback from the R community so far?

Karthik-Since formally starting rOpenSci two summers ago, we've had an incredibly positive response from various communities. We started out by developing two packages and now have more than 30 with several new ones in various stages of development. Several of these packages are collaborative efforts involving more than 16 collaborators, most of whom discovered us through Twitter, Google+, and R-bloggers. We currently have more requests than we handle, primarily because the core team works on this effort in our spare time.

Ajay- Why are REST APIs important for statistical and scientific research, in your opinion.

Karthik- Although industry has readily embraced the data revolution, the uptake in most sciences has been fairly slow. I believe that opening up programmatic access to data is really important for advancing science. RESTful APIs make it exceedingly easy to develop tools for programmatic access to data. This allows researchers to make their findings reproducible by sharing just a few lines of code. Not only does this increase transparency and allows other to verify findings, but also makes it easy for anyone to build upon existing work. Many new insights in science will be data-driven, and will likely emerge from leveraging multiple data sources. RESTful APIs are one way to speed up that discovery process.

Ajay-Describe your career journey in scientific research from your schooling days.

Karthik- I was drawn to the idea of becoming a naturalist from a fairly young age primarily from watching an inordinate number of nature documentaries. As I got older, I volunteered at several field projects, got a degree in zoology, and ended up in graduate school. My research focuses on species interactions, primarily trophic cascades, where indirect interactions can have strong effects on the entire community. I use theory and experiments to understand dynamics of these (and related) interactions. My real foray into data science began in my first postdoc when I started working on a large dataset to understand the effects of climate change on the large mammal food web in Yellowstone. I quickly realized that my local R instance wouldn't cut it and had to scale up to using Amazon's EC2 and eventually a National lab supercomputer. I'm now a population ecologist at Berkeley, working on various projects, some of which are experimental while others are interdisciplinary and data-driven.

Ajay-Describe an interesting result that came about by using ROpenSci packages?

Karthik- The biggest reward thus far has been seeing the research community get really excited about accessing data and making their own research more open and reproducible.

My own favorite use cases for rOpenSci packages are data discovery and text mining. If I'm interested in learning about a new topic, especially one that is cutting edge, I can quickly look up altmetric data on all available documents using two separate packages. These data then allow me to rank the hottest articles, and use related resources to quickly get up to speed on the topic.

I also love the full-text API from the Public Library of Science. Within seconds I can look up trends on topics and research methods with just a few commands. Much much faster than ever using a browser and I can quickly share my work by posting a gist on GitHub.


R is a language and environment for statistical computing and graphics.  rOpenSci is a collaborative effort to develop R-based tools for facilitating Open Science by providing programmatic access to a variety of scientific data, full-text of journal articles, and repositories that provide real-time metrics of scholarly impact.

Open Science through statistical computing? Just another rOpenSci API call away!

Be sure to read the next Best Practices article: PW Interview: Stuart Battersby, Chatterbox API, Machine Learning meets Social