Fortran Legacy Code – How to Modernize a Large Number Crunching Codebase

fortranlegacymath

A friend in academia asked me for advice (I'm a C# business application developer).

He has a legacy codebase which he wrote in Fortran in the medical imaging field. It does a huge amount of number crunching using vectors. He uses a cluster (30ish cores) and has now gone towards a single workstation with 500ish GPUS in it.

However where to go next with the codebase so:

  • Other people can maintain it over next 10 year cycle
  • Get faster at tweaking the software
  • Can run on different infrastructures without recompiles

After some research from me (this is a super interesting area) some options are:

  • Use Python and CUDA from Nvidia
  • Rewrite in a functional language. For example, F# or Haskell
  • Go cloud based and use something like Hadoop and Java
  • Learn C

What has been your experience with this? What should my friend be looking at to modernize his codebase?

UPDATE: Thanks @Mark and everyone who has answered. The reasons my friend is asking this question is that it's a perfect time in the projects lifecycle to do a review. Bringing research assistants up to speed in Fortran takes time (I like C#, and especially the tooling and can't imagine going back to older languages!!)

I liked the suggestion of keeping the pure number crunching in Fortran, but wrapping it in something newer. Perhaps Python as that seems to be getting a stronghold in academia as a general-purpose programming language that is fairly easy to pick up.

See Medical Imaging and a guy who has written a Fortran wrapper for CUDA, Can I legally publish my Fortran 90 wrappers to Nvidias' CUFFT library (from the CUDA SDK)?.

Best Answer

The demands you have put actually put Fortran at the top of the list, for problems like this:

a) number crunching
b) parallelable
c) it was and still is the de facto language taught outside of CS studies (to engineers who aren't professional programmers).
d) has an incredible(!) industry backing, number-of-industry-grade-compilers-wise, with none of the vendors showing the least signs of abandoning that branch. One of Intel's representatives not far ago revealed that sales of their Fortran products are higher than any other in their development tools.

It is also a language which is incredibly easy to pick up. I don't agree that it takes time for bringing research assistants up to speed. My first textbook on it had no more than, oh I don't know, 30 (?) pages of sparse printed text. It is a language in which after learning 10 keywords, one can write medium-sized programs. I would dare say that those 30 pages written in default Word text would make a more than comprehensive "Fortran manual" for most users.

If you're interested in CUDA, you might want to check Portland Group's compiler, which supports it. I'm not familiar with the finer details, but people generally talk of it with praise.

Apart from that, for paralleling programs you have available OpenMP, MPI and now the upcoming (and long awaited) co-arrays, which Intel's compiler has recently implemented. To not waste words, Fortran has a very fine gamma of "libraries" for parallelizing programs.

Industry standard numerical libraries are developed for it foremost, other languages following more or less in the function/routines portfolio.

All that being said, I would however (depends on when it was originally written) recommend if it is, let's say, F77 code or older, rewriting it partially through time to newer dialects - F90 at least, if possible with F2003 features. A paper / thesis on that topic was recently published (medium-sized PDF file ahead). Not only can that, if done properly, ensure portability across multiple platforms, but will also make it more easy for future maintenance.

p.s. As far as "future maintenance" goes, just an anecdote which I sometimes like to mention. While writing my thesis, I reused some code from my mentor, written 35 years ago from the time of writing. It compiled with only one error; a statement missing at the end, due to copy-paste mistake :)


@DaveMateer (reply to comment) - I'm going to make a comment in the following which may be a bit impolite, but please don't take it the wrong way, for it is in the fair intentions.

It seems to me you're tackling this "problem" in the wrong way. What I mean in a few short points (for it is very late in here, and my ability to make up readable (let alone comprehensible) sentences leaves me after 10p.m.)

a) You mentioned you're trying to minimize extra coding time, yet you're considering a rewrite from a language specialized for numerical computing to one from a colorful choice of languages, if you'll pardon my expression

  • some of which don't have support for multidimensional arrays, amongst other things
  • most of them are unsuitable for heavy numerical work (of parallel processing capabilities of Haskell and Hadoop I admit, I know nothing about ... but have never heard them even mentioned in those circles)
  • it possibly has been tried, but I've never heard of a rewrite from Fortran, a language for discretized problems, to a functional language
  • there has been a discussion recently on comp.lang.fortran (try searching through Google Groups) on the aspects of scientific computing "in the cloud"
    (wouldn't like to demotivate you, but to be fair, no one was really sure what that term even represents, let alone had an example of a successful application. Most people agreed that potential exists, but so far they're happy the way things work for now.). A lot of problems are not suitable for that kind of parallelisation either.

b) What would be the costs of such a rewrite? People/hours.

c) Correct versions of the libraries to compile...- is a problem in any language that cannot be avoided, however you look at it.

d) I've heard of Python (a nice language really) used in parallel applications on a few occasions, but its penetration of that market still doesn't seem to be rising, and its ever changing nature makes it a very poor choice for a long term project (think backward compatibility). Some people like it very much as a "glue" language.

Ugh, if I think of anything else, will add it tomorrow. Gotta get some sleep...