Python – Parsing scripts that use curly braces

algorithmsparsingpython

To get an idea of what I'm doing, I am writing a python parser that will parse directx .x text files.

The problem I have deals with how the files are formatted. Although I'm writing it in python, I'm looking for general algorithms for dealing with this sort of parsing.

.x files define data using templates.
The format of a template is

template_name {
   [some_data]
}

The goal I have is to parse the file line-by-line and whenever I come across a template, I will deal with it accordingly.

My initial approach was to check if a line contains an opening or closing brace. If it's an open brace, then I will check what the template name is.

Now the catch here is that the open brace doesn't have to occur on the same line as the template name. It could just as well be

template_name
{
   [some_data]
}

So if I were to use my "open brace exists" criteria, it won't work for any files that use the latter format.

A lot of languages also use curly braces (though I'm not sure when people would be parsing the scripts themselves), so I was wondering if anyone knows how to accurately get the template name (or in some other languages, it could just as well be a function name, though there aren't any keywords to look for)

Best Answer

If you are using Python then you should look at using PyParsing, I have used it with great success for a years now.

The pyparsing module is an alternative approach to creating and executing simple grammars, vs. the traditional lex/yacc approach, or the use of regular expressions. The pyparsing module provides a library of classes that client code uses to construct the grammar directly in Python code...

Related Topic