Bit of a nerdy one today, and not one relating to anything in particular. It’s just that over the past months i’ve become increasingly obsessed with the Python programming language. It’s just beautiful – I’m not going to go into much details as they can be found on python.org, but in short python can be described as executable pseudocode. For instance, the classical hello world program consists of:
print 'hello world'
And then there’s the Zen of Python . This tells you everything you need to know: python is a language designed for ease of use, to appear sane to anyone who has to read it.
So why is this so important for science?
Well for one thing it’s just a great tool for everything. And I mean everything – it can be used just as well to process data, create optimization code, create control systems and GUIs, perform algebra, do stats, access databases on and offline, and even create web pages by using the right modules and frameworks. It is a general purpose language, unlike things like Matlab or Mathematica. While these and others do offer benefits from being more specialized, often the benefits of being able to learn one language outweigh these benefits. Even better, because all the modules are python, you can tie in different parts of your scientific code to, for example, use one module to get data from a database, another to process it, another to create nice graphs, and even to publish the results.
There are plenty of scientific modules out there and they are improving all the time, Numpy and Scipy being the most important basis for dealing with lots of numbers. In fact the only problem is that there can be quite a lot of modules which do similar things, some of which are more up to date and useful than others, meaning a choice needs to be made. In that regard, scientific python packages like EPD can help a lot.
But what i really want to talk about is more than how nice the language is. It is nice, and does have tools available for more or less every scientific need. But more importantly than that, it is free – both as in beer and as in freedom. Since it has no cost and is open source, it is accessible to everyone. Anyone can install python and run someone else’s code, and they don’t have to pay for a very expensive propriety software license to do so. This is crucial for being able to reproduce scientific results. In business it doesn’t really matter if people use a costly tool since everything stays internal. In research, results need to be reproducible. More and more research depends on work done using computer code, one way or another. If people don’t have the costly software needed to run that code, then they are prevented from running that experiment in the exact way it was originally done.
By using python code, anyone can get the same data analysis running. Thanks to how simple python is to use and how easy it is to read, the barrier for someone else trying to understand the code is much lower than for many other languages.
This is also an equality issue – while wealthy institutions may bite the bullet and buy licenses for their academics, many simply cannot afford to, especially in developing countries. It is not fair to discriminate against scientists around the world simply because they can’t buy their way into the computer club.
It is also of great benefit to students for them to learn to use a free tool rather than a proprietary one. When they leave, they will not be locked into using an expensive piece of software. Their skills will always be useful to them, because even if their workplace doesn’t provide a programming environment, they can just use a free python-based one.
Finally, python could bring a measure of standardization to scientific coding practice. I have witnessed some very dodgy ad-hoc setups in labs, the sort of thing that would make a business project manager cry: bits of code in one language using other bits of code in a different language via various hacks, all because everyone knew a different language an no one could be bothered to coordinate anything. Plus the languages used were all to difficult or inflexible to allow newcomers to usefully improve them.
So I urge everyone who can to use python for their scientific work. I say this also hoping this will encourage improvement in the python ecosystem. There do remain a couple of warts, in particular packaging distributing python programs can be a bit of a pain especially if you require specific modules for your program to work. I would really like to see the emergence of standard practice for python in science: conventions on organization and naming and publishing of code. I would love to see something like a Github for Science, where teams could publish and manage their code and refer to it in published papers (hmm, like that so much it might get it’s own post!).
In waiting for all this magic to happen, do yourself a favor and program in python!
Nice tip: If you’re on Linux of OSX, open a terminal and type:
python (This will open a python prompt.)
Then type
import this
- quit python by typing quit()
ptr
July 6, 2012
What’s your opinion of Sage? I’ve started to play around with sage as an intro to python, analyzing some spectra and doing basic curve-fitting. There doesn’t seem to be any scientific community based around them. One advantage with sage over pure python is the idea that you can simply copy a worksheet from one machine and use it on another without having to worry about dozens of dependencies. Or, you could share the worksheet online.
I’m thinking sage might be very useful for managing data like spectra — numerous, but relatively simple. Imagine a lab having its own server so they can easily share NMR or fluorescence spectra, or maybe even genomics data (which could then interface with BLAST!)
So I think anyone starting out in science (like me) should use sage, since you’ll also be learning python. Hopefully I can make useful contributions to sage in the near future.
mangecoeur
July 6, 2012
I looked at Sage but didn’t go with it simply because I wanted to learn “pure” python. I also didn’t like that it wasn’t 100% python syntax, however i think python now uses the same syntax in python 3 or using from __future__ import… I’m also not sure if you can use regular python modules that don’t come with Sage, and in general you can be a bit limited to the sage way of doing things. I preferred to get used to regular python from the start. I generally thought of Sage as a teaching tool but I guess it could work very well in a lab. You could imagine some really neat combination of Sage on a server together with a simple version control system where everyone’s work is synchronized and tracked and can access shared data.
It’s true that it’s a very good way to get a neat package that other people can use easily, especially since, as i mentioned, distributing python is still a a pain. Regular python expects people packaging applications to be developers who want to share their modules and final software rather than people who want to easily share a bundle of useful tools and scripts, and so makes you learn things like setuptools and requirements files. Sage skips over that – I haven’t had to collaborate much with other people using python so I haven’t felt the pain, but I imagine that Sage would be friendly in this regard.
I settled for the Enthought Python Distribution which is excellent – and I like how much work Enthought puts into maintaining and improving the ecosystem. Their full package is also free for academics. For others there’s a free edition with fewer packages, since it’s regular python you can still install all the other packages you want. Of course this isn’t always easy depending on your platform, I wasted a lot of time compiling C-extension modules which demanded the right combination of dependencies, compiler options, and phase of the moon to work!
Ryan Gubbins
August 14, 2012
Thank you very much for the article. Although I am not a scientist myself, I do recognize that more can be achieved in science reducing the ‘Recreate the Wheel’ paradigm. The sooner scientists can come together as a community with a set accessible and effective analysis tools rather than the patchwork solutions you mentioned in your article the sooner energies that are wasted elsewhere can be applied to the problem at hand.
Once again great work, great article.