Regular Expressions – Why Is the Syntax Poorly Readable?

readabilityregular expressions

Programmers all seem to agree that readability of code is far more important than short-syntaxed one-liners which work, but require a senior developer to interpret with any degree of accuracy – but that seems to be exactly the way regular expressions were designed. Was there a reason for this?

We all agree that selfDocumentingMethodName() is far better than e(). Why should that not apply to regular expressions as well?

It seems to me that rather than designing a syntax of one-line logic with no structural organization:

var parse_url = /^(?:([A-Za-z]+):)?(\/{0,3})(0-9.\-A-Za-z]+)(?::(\d+))?(?:\/([^?#]*))?(?:\?([^#]*))?(?:#(.*))?$/;

And this isn't even strict parsing of a URL!

Instead, we could make a some pipeline structure organized and readable, for a basic example:

string.regex
   .isRange('A-Z' || 'a-z')
   .followedBy('/r');

What advantage does the extremely terse syntax of a regular expression offer other than the shortest possible operation and logic syntax? Ultimately, is there a specific technical reason for the poor readability of regular expression syntax design?

Best Answer

There is one big reason why regular expressions were designed as terse as they are: they were designed to be used as commands to a code editor, not as a language to code in. More precisely, ed was one of the first programs to use regular expressions, and from there regular expressions started their conquest for world domination. For instance, the ed command g/<regular expression>/p soon inspired a separate program called grep, which is still in use today. Because of their power, they subsequently were standardized and used in a variety of tools like sed and vim

But enough for the trivia. So why would this origin favor a terse grammar? Because you don't type an editor command to read it even one more time. It suffices that you can remember how to put it together, and that you can do the stuff with it that you want to do. However, every character you have to type slows down your progress editing your file. The regular expression syntax was designed to write relatively complex searches in a throw-away fashion, and that is precisely what gives people headaches who use them as code to parse some input to a program.

Related Topic