Github for Science: A Concept

Posted on July 4, 2012


So coding and science have a rocky relationship. Scientists write programs to do all sorts of things: control experiments, process data, visualize it, store it in databases, solve maths problems, and more. But scientists are not software architects. Nor are they often terribly good project managers. What this means is that code that is crucial to running experiments properly ends up as a horrible mess of files on different people’s computers. Different people end up with different versions, no one knows who has the one that works, and if someone with crucial knowledge about how to make things go leaves everything is chaos.

Now a step towards fixing this is version control systems: a server which keeps track of all the files in the project, each user has their copy, and they can keep track of changes across versions, make forks of the project, and so on. Github currently does this really really nicely for open source code. It provides an awesome interface which takes away a lot of the pain of dealing with Git – a version control system which, for all its success, has a pretty steep learning curve.

What would be perfect is a site similar to Github but specially designed for science. Some ideas for features:

  • Make projects easily discoverable and help academics keep track of everything.
  • Be based on Git, Mercurial, or Bazar – it doesn’t matter to much so long as tools are available that makes using the site painless.
  • Have most of the features of Github including being able to create teams and project web pages.
  • Could add some project management features and a greater ability to create notes about work progress and experiment settings.
  • Leave out the more social and software-engineeringy features to simplify things for researchers.
  • Have an easy method for referring to code in papers. Make it possible to refer to a specific frozen snapshot of a given set of files, rather than to the project as a whole.
  • Promote the python programming language as a standard, encouraging people to write inter-operable code
  • Promote standardized project layouts and practices – e.g. having a “charts” script that spits out all the charts for a project, which could be automatically generated and displayed on the project web page
  • Allow projects to be partly public – often research is ultimately public domain but there are good reasons for not wanting to have all of your code out in the open all the time.
  • Integrate with scientific journal sites
Posted in: Computers