Programming Languages – Are Regular Expressions a Programming Language?

programming-languagesregular expressions

In the academic sense, do regular expressions qualify as a programming language?

The motivation for my curiosity is an SO question I just looked at which asked "can regex do X?" and it made me wonder what can be said in the generic sense about the possible solutions using them.

_{I am basically asking, "are regular expressions Turing complete"?}

Best Answer

Regular Expressions are a particular kind of formal grammar used to parse strings and other textual information that are known as "Regular Languages" in formal language theory. They are not a programming language as such. They are more of a shorthand for coding that would otherwise be extremely tedious to implement and even more confusing than the sometimes arcane looking Regex.

Programming Languages are typically defined as languages that are Turing Complete. Such languages must be able to process any computable function. Regex does not fit into this category.

If you want a language that looks like Regex, try J.

Related Solutions

Using lookahead assertions in regular expressions

I still find lookahead and lookbehind to be terribly confusing and often unreadable.

You're aware that regular expressions can be exploded and commented, right?

$foo =~ m/^
  (?=.*a)           # must contain an a somewhere
  (?=.*c)           # must contain a c somewhere
  (?=.*1)           # must contain a 1 somewhere
  (?=.*2)           # must contain a 2 somewhere
  \S+               # all non-space characters
$/x

Is it good practice to use lookahead/lookbehind in regular expressions, or are they simply a hack that have found their way into modern production code?

They are quite indispensable, to avoid catastrophic backtracking and regex-related security issues. Ideally use plain atomic groups as well.

Compare how the above expression will backtrack, as compared with the naive equivalent:

$foo =~ m/^
  \S*a\S*c\S*1\S*2\S*      # a, then c, then 1, then 2
 |
  \S*a\S*c\S*2\S*1\S*      # a, c, 2, 1
 |
  \S*a\S*1\S*c\S*2\S*      # a, 1, c, 2
 |
  \S*a\S*1\S*2\S*c\S*      # a, 1, 2, c
 |
  # ... etc
$/x

Especially with a long input and a random sequence of a, c and 2 (no 1).

Is it possible to combine programming languages

You first example is sort of possible. Usually such things happen in PHP (and other related web-programming languages) like this:

<HTML>
<?PHP
call_some_php_function(1,2,"a","b"); /* This is may return nothing, a text string, or actual HTML markup code */
?>
</HTML>

Some important points to note about this example:

HTML is NOT a progamming language, it is a markup language.
The PHP and HTML and not executed/interpreted in the same place: PHP code is executed by a PHP interpreter running on the server and the result is "injected" into the surrounding HTML. Then that whole blob is sent to the client/browser which renders the complete HTML.

Your second example looks like some sort of mash-up of C++ and Java. It's possible to have compiled modules written in different languages talk to each other, but to combine Java and C++ in the same source file would be extremely confusing and difficult: how would the compiler know which statements are Java and which are C++?

I suppose in theory you could write a special compiler/pre-processor with "language" indicators such as:

Java
{
    import java.util.Scanner;
}
C++
{
   cout << "Insert a number from 1 to 10";
}
Java
{
    Scanner n = new Scanner(System.in); //Actually, this line *could* be a C++ line - it's hard for me to tell just by looking at it.
    System.out.println("The value you entered was" +n.newLine());
}

But I'm honestly not sure you'd gain anything useful by doing this.

Also, how would this hybrid language environment handle language features which are incompatible between the two?

Best Answer

Related Solutions

Using lookahead assertions in regular expressions

Is it possible to combine programming languages

Related Topic