Python vs Ruby – Why Python is Used for High-Performance Computing

high performancepythonruby

There's a quote from a PyCon 2011 talk that goes:

At least in our shop (Argonne National Laboratory) we have three
accepted languages for scientific computing. In this order they are
C/C++, Fortran in all its dialects, and Python. You’ll notice the
absolute and total lack of Ruby, Perl, Java.

It was in the more general context of high-performance computing. Granted the quote is only from one shop, but another question about languages for HPC, also lists Python as one to learn (and not Ruby).

Now, I can understand C/C++ and Fortran being used in that problem-space (and Perl/Java not being used). But I'm surprised that there would be a major difference in Python and Ruby use for HPC, given that they are fairly similar. (Note – I'm a fan of Python, but have nothing against Ruby).

Is there some specific reason why the one language took off? Is it about the libraries available? Some specific language features? The community? Or maybe just historical contigency, and it could have gone the other way?

Best Answer

I'll expand on my comment.

I think there are a few factors that influenced the use of Python in scientific computing, though I don't think there are any definitive historical points where you could say, "Yes, that is the reason why Python is used over Ruby/anything else"

Early History

Python and Ruby are of roughly the same age - according to Wikipedia, Python was officially first released in 1991, and Ruby in 1995.

However, Python came to prominence earlier than Ruby did, as Google was already using Python and looking for Python developers at the turn of the millenium. Since it's not like we have a curated history of uses of programming languages and their influences on people who use them, I will theorize that this early adoption of Python by Google was a big motivator for people looking to expand beyond just using Matlab, C++, Fortran, Stata, Mathematica, etc.

Namely, I mean that Google was using Python in a system where they had thousands of machines (think parallelization and scale) and constantly processing many millions of data points (again, scale).

Event Confluence

Scientific computing used to be done on specialty machines like SGIs and Crays (remember them?), and of course FORTRAN was (and still is) widely used due to its relative simplicity and because it could be optimized more easily.

In the last decade or so, commodity hardware (meaning stuff you or I can afford without being millionaires) have taken over in the scientific and massive computing realm. Look at the current top 500 rankings - many of the top ranked 'super computers' in the world are built with normal Intel/AMD hardware.

Python came in at a good time since, again, Google was promoting Python, and Google was using commodity hardware, and they had thousands of machines.

Plus if you dig into some old scientific computing articles, they started to spring up around the 2000-era.

Earlier Support

Here's an article written for the Astronomical Data Analysis Software and Systems, written in 2000, suggesting Python as a language for scientific computing.

The article has this quote about Python:

Python is an interpreted object-oriented programming language that is starting to receive considerable attention in scientific applications (Python, 1999). This is because Python, and scripting languages in general, represent a next logical step for many scientific projects (Dubois 1994). First, Python provides an interpreted programming language that can be viewed as an extension of the simple command languages already used by scientific programs

Second, Python is easily integrated with software written in other languages. As a result, it can serve as both a control language for driving existing programs as well as a glue language for combining different systems together. Finally, Python provides a large collection of third party modules, an established user base, and a variety of documentation in the form of books and online references. For this reason, one might view it as a highly polished and expanded version of what scientists often try to accomplish when writing their own command interpreters.

So you can see that Python had already had traction dating back to the late 90s, due to it being functionally similar to the existing systems at the time, and because it was easy to integrate Python with things like C and the existing programs. Based on the contents of the article, Python was already in scientific use dating back to the 1995-1996 timeframe.

Difference in Popularity Growth

Ruby's popularity exploded alongside the rise of Ruby On Rails, which first came out in 2004. I was in college when I first really heard the buzz about Ruby, and that was around 2005-2006. django for Python was released around the same time frame (July 2005 according to Wiki), but the focus of the Ruby community seemed very heavily centered on promoting its usage in web applications.

Python, on the other hand, already had libraries that fit scientific computing:

  • NumPy - NumPy officially started in 2005, but the two libraries it was built on were released earlier: Numeric (1995), and Numarray (2001?)

  • BioPython - biological computing library for python, dates back to 2001, at least

  • SAGE - Math package with first public release in early 2005

And many more, though I don't know many of their time lines (aside from just browsing their download sites), but Python also has SciPy (built on NumPy, released in 2006), had bindings with R (the statistics language) in the early 2000s, got MatPlotLib, and also got a really powerful shell environment in ipython.

ipython was first released in the early 2000s, and has had many features added to it that make it very nice for scientific computing, like integrated matplotlib graphing and being able to manage computational clusters.

From above article:

It is also worth noting a number other Python related scientific computing projects. The numeric Python extension adds fast array and matrix manipulation to Python (Dubois 1996), MMTK is Python-based toolkit for molecular modeling (Hinsen 1999), the Biopython project is developing Python-based tools for life-science research (Biopython 1999), and the Visualization Toolkit (VTK) is an advanced visualization package with Python bindings (VTK, 1999). In addition, ongoing projects in the Python community are developing extensions for image processing and plotting. Finally, work presented in (Greenfield, 2000) describes the use of Python in projects at the STScI.

Good list of scientific and numeric packages for Python.


So a lot of it is probably due to the early history, and the relative obscurity of Ruby until the 2000s, whereas Python had gained traction thanks to Google's evangelism.

So if you were evaluating scripting languages in the period from 1995 - 2000, what were you really looking at? There was Perl, which was probably different enough syntactically that people didn't want to use it, and then there was Python, which had a clearer syntax and better readability.

And yes, there is probably a lot of self-reinforcement - Python already has all these great, useful libraries for scientific computing, while Ruby has a minority voice advocating its use in science, and there are some libraries sprouting up, like SciRuby, but Python's tools have matured over the last decade.

Ruby's community at large seems to be much more heavily interested in furthering Ruby as a web language, as that's what really made it well known, whereas Python started off on a different path, and later on became widely used as a web language.

Related Topic