Mapping enum values into regexes

enumlanguage-agnosticmaintenanceregular expressions

I'm doing some code cleanup and I'm looking at my regexes. I have an extremely simple one:

(ARA|CHI|FRE|GER|ITA|JPN|RUS|SPA)\s[0-9]{3}-[0-9]{2}

It basically validates course identifiers for a specific department (Modern Languages and Literatures) at my university, e.g. FRE 101-01 is valid, CIS 101-01 is not; I cannot simply use [A-Z]{3}.

I already have a class constant DEPARTMENTS:

DEPARTMENTS = {
    arabic: "ARA",
    chinese: "CHI",
    french: "FRE",
    german: "GER",
    italian: "ITA",
    japanese: "JPN",
    russian: "RUS",
    spanish: "SPA"
}

This is Ruby, but could probably apply to any language: is it a good idea to map the enum into the regex instead of explicitly re-listing the values? Here is what I mean. In Ruby, I could build the regex like:

(DEPARTMENTS.values.join('|'))\s[0-9]{3}-[0-9]{2}

It would probably look similar in other languages, maybe less concise in static ones. I have several enums that go into this kind of regex validation. The advantage to the latter approach, that I see, is that I only need to update one spot in the code if we add or remove a department. The sacrifice is a little bit of readability. Granted, the regex and enum live in the same class (in all cases) so it would be fairly hard to forget to update the regex too, and finding DEPARTMENTS to know what values it has would take all of 5 extra seconds…

Is:

(DEPARTMENTS.values.join('|'))\s[0-9]{3}-[0-9]{2}

more appropriate than:

(ARA|CHI|FRE|GER|ITA|JPN|RUS|SPA)\s[0-9]{3}-[0-9]{2}

?

Best Answer

Personally, I wouldn't even use regex to validate the string :) It's simple enough validation to be written by hand and, by doing so, your problem goes away.

But, assuming you want to keep the regex then I favour your first option:

(DEPARTMENTS.values.join('|'))\s[0-9]{3}-[0-9]{2}

It is DRY and its only drawback is that it makes the regex slightly harder to reason about. But, considering being "hard to reason about" is kind of regex's thing, I wouldn't work about it too much.

If, however, you find yourself in the quirky position where you choose to keep your original approach:

(ARA|CHI|FRE|GER|ITA|JPN|RUS|SPA)\s[0-9]{3}-[0-9]{2}

Then you can always write a unit test that verifies the above regex includes all (and only all) of your departments.

Related Topic