Configuration File Formats – Regular Expressions Without Escaping

configurationfile formats

I want a configuration file for a .NET program.
This file is to configure pairs of regular expressions.
The regular expressions belong within a hierarchy of sections.

Section1
    SubsectionA
        regular expression
        regular expression
    SubsectionB
        regular expression
        regular expression
Section2
    (etc.)

Or in Markdown format

# Section1

## SubsectionA

    regular expression
    regular expression

Anyway I want a configuration file format in which the regular expression literals do not need to be escaped.

What configuration file format supports this? Even YAML requires escaping.

The two examples I showed above — i.e. an indented text file, and Markdown — are OK but non-standard.

Best Answer

CDATA sections in XML should do.

Here's a stackoverflow post about it: https://stackoverflow.com/questions/2784183/what-does-cdata-in-xml-mean

I remember it took me a while to understand how to use them. A DOM parser has a dedicated instruction for creating a CDATA section but there is no equivalent statement for reading them. Reading is transparent, you just read the contents of the element that has the CDATA section in it to have the literal text returned.

Here's an example taken from the input data file of a code scrutinizer I once made. It allows the definition of the forms of problematic code fragments using regular expressions.

<IssueBuster type="Basic" name="Suspicious lambdas" skip="true">
    <Description>
        <!-- See https://stackoverflow.com/questions/2465040/using-lambda-expressions-for-event-handlers -->
        A lambda expression is used for event handlers which inhibits unsubscribing.
    </Description>
    <Regex><![CDATA[\+\=\s*\([^\s\,]+\,\s*[^\s\)]+\)\s*=>]]></Regex>
    <SkipFileNames>
        <!-- If any of these inner texts appears in a file path, this buster will ignore that file. -->
        <FileName>SMMdataComponent\DeltaPlusGenerator\TestForm.cs</FileName>
        <FileName>Toolchain\Validate-TranslationEnums</FileName>
        <FileName>Tools\JcSimulator</FileName>
        <FileName>Tools\AR3toGps</FileName>
        <FileName>Tools\XMLConverter</FileName>
        <FileName>GitManipulator.cs</FileName>
    </SkipFileNames>
</IssueBuster>

Note that CDATA takes this form:

<![CDATA[your_literal_text]]>

Whatever you put in between the inner square brackets will be returned verbatim.

To wrap this up: in the unlikely event you have to include a ]]> sequence in the content, you can split the content after the second ] and create two consecutive CDATA sections. This can easily be implemented recursively.

Related Topic