Sunday, March 2, 2014

Roadblocks to widespread use of computational science

I'm writing this post as I'm in the middle of a problem set for my Machine Learning class.  I have a few seconds to write a blog post because I have to reconfigure applications, compilers, and various source code to do this problem set.  This is frustrating to me.

I don't claim to be a computer scientist.  I'm really a biologist; I was trained in that field, and I use the tools of Computer Science to investigate biological questions (e.g., about community structure, extinction risk, dynamical systems in ecology, etc.).

However, I like Computer Science. It lets me do great things, like simulate evolution many thousands of times, explore the outcome of stochastic processes, and make good predictions about what our null expectations in ecology should be.  I use Python, Matlab, and R pretty regularly, and I can make my way around a Unix command line (slowly).

So here is what frustrates me: computer scientists want to build tools that will help scientists in other fields, but with little understanding of what the background set-up work looks like -- to non-computer scientists, especially! -- in order to use these "black box" functions.

Right now, I'm implementing libsvm, a package that works in Matlab, Octave and Python to produce Support Vector Machine models.  This package purports that it is easy to use; one of the developers' goals is for libsvm to be accessible to scientists with any kind of data set.

Here's the catch: to get this package to work, I've spent several hours reading StackOverflow posts and blog posts, trying to run the install program for libsvm from within Matlab, getting error messages, google searching, trying to use the command line ('brew install gnuplot' etc.), crying, digging into the source code of another SVM program (the built-in Matlab svmtrain), calling a friend, and finally figuring out a solution.  So, here's my solution: a classmate pointed me towards a blog post that will allow me to download libsvm onto my Mac.  All this requires is updating my Xcode compiler (and, of course, updating command line tools within Xcode), and downgrading to Matlab 2012a.  Then, if I follow the directions exactly, I should be able to get this "easy" package to work.



It seems doubtful to me that many non-computer scientists are going to have the patience to use support vector machines if this is the best we can do.  I'm not making a normative argument about whether or not black-box algorithms or data analyses are useful or good for science.  But, if computer scientists want to continue to make claims about creating software that will help other fields advance, software that is easily accessible and can be used "out of the box," there need to be some changes.

Maybe it's time for Computer Scientists to team up with marketing or communications departments.  I've noticed that everyone from Comcast to Ikea has very good user-interfaces these days.  If we truly are working towards a revolution in computational sciences, we need to take usability into more serious consideration.