Chapter 3
Tutorial

3.1 Introduction

This short tutorial presents how to make a simple calculator. The calculator will compute basic mathematical expressions (+, -, *, /) possibly nested in parenthesis. We assume the reader is familiar with regular expressions.

3.2 Defining the grammar

Expressions are defined with a grammar. For example an expression is a sum of terms and a term is a product of factors. A factor is either a number or a complete expression in parenthesis.

We describe such grammars with rules. A rule describe the composition of an item of the language. In our grammar we have 3 items (Expr, Term, Factor). We will call these items ‘symbols’ or ‘non terminal symbols’. The decomposition of a symbol is symbolized with -->. The grammar of this tutorial is given in figure 3.1.



Figure 3.1: Grammar for expressions


Grammar rule

Description





Expr  -->  Term ((' + '|'-') Term)*

An expression is a term eventually followed with a plus (' + ') or a minus ('-') sign and an other term any number of times (* is a repetition of an expression 0 or more times).



Term  -->  Fact (('*'|'/') Fact)*

A term is a factor eventually followed with a '*' or '/' sign and an other factor any number of times.



Fact  -->  number | '(' Expr ')'

A factor is either a number or an expression in parenthesis.




We have defined here the grammar rules (i.e. the sentences of the language). We now need to describe the lexical items (i.e. the words of the language). These words - also called terminal symbols - are described using regular expressions. In the rules we have written some of these terminal symbols (+,-,*,/,(,)). We have to define number. For sake of simplicity numbers are integers composed of digits (the corresponding regular expression can be [0 - 9]+). To simplify the grammar and then the Python script we define two terminal symbols to group the operators (additive and multiplicative operators). We can also define a special symbol that is ignored by TPG. This symbol is used as a separator. This is generaly usefull for white spaces and comments. The terminal symbols are given in figure 3.2



Figure 3.2: Terminal symbol definition for expressions



Terminal symbolRegular expressionComment






number [0 - 9]+ or \d+ One or more digits



add [+-] a + or a -



mul [*/] a * or a /



spaces \s+ One or more spaces




This is sufficient to define our parser with TPG. The grammar of the expressions in TPG can be found in figure 3.3.




Figure 3.3: Grammar of the expression recognizer
class Calc(tpg.Parser):  
    r"""  
 
    separator spaces: '\s+' ;  
    token number: '\d+' ;  
    token add: '[+-]' ;  
    token mul: '[*/]' ;  
 
    START -> Expr ;  
 
    Expr -> Term ( add Term )* ;  
 
    Term -> Fact ( mul Fact )* ;  
 
    Fact -> number | '\(' Expr '\)' ;  
 
    """



Calc is the name of the Python class generated by TPG. START is a special non terminal symbol treated as the axiom1 of the grammar.

With this small grammar we can only recognize a correct expression. We will see in the next sections how to read the actual expression and to compute its value.

3.3 Reading the input and returning values

The input of the grammar is a string. To do something useful we need to read this string in order to transform it into an expected result.

This string can be read by catching the return value of terminal symbols. By default any terminal symbol returns a string containing the current token. So the token '(' always returns the string '('. For some tokens it may be useful to compute a Python object from the token. For example number should return an integer instead of a string, add and mul should return a function corresponding to the operator. That why we will add a function to the token definitions. So we associate int to number and make_op to add and mul.

int is a Python function converting objects to integers and make_op is a user defined function (figure 3.4).




Figure 3.4: make_op function
def make_op(s):  
    return {  
        '+': lambda x,y: x+y,  
        '-': lambda x,y: x-y,  
        '*': lambda x,y: x*y,  
        '/': lambda x,y: x/y,  
    }[s]



To associate a function to a token it must be added after the token definition as in figure 3.5




Figure 3.5: Token definitions with functions
    separator spaces: '\s+' ;  
    token number: '\d+' int ;  
    token add: '[+-]' make_op;  
    token mul: '[*/]' make_op;



We have specified the value returned by the token. To read this value after a terminal symbol is recognized we will store it in a Python variable. For example to save a number in a variable n we write number/n. In fact terminal and non terminal symbols can return a value. The syntax is the same for both sort of symbols. In non terminal symbol definitions the return value defined at the left hand side is the expression return by the symbol. The return values defined in the right hand side are just variables to which values are saved. A small example may be easier to understand (figure 3.6).



Figure 3.6: Return values for (non) terminal symbols


Rule

Comment



X/x ->

Defines a symbol X. When X is called, x is returned.

Y/y

X starts with a Y. The return value of Y is saved in y.

Z/z

The return value of Z is saved in z.

$ x = y+z $

Computes x.

;

Returns x.




In the example described in this tutorial the computation of a Term is made by applying the operator to the factors, this value is then returned :

    Expr/t -> Term/t ( add/op Term/f $t=op(t,f)$ )* ;

This example shows how to include Python code in a rule. Here $...$ is copied verbatim in the generated parser.

Finally the complete parser is given in figure 3.7.




Figure 3.7: Expression recognizer and evaluator
class Calc(tpg.Parser):  
    r"""  
 
    separator spaces: '\s+' ;  
 
    token number: '\d+' int ;  
    token add: '[+-]' make_op ;  
    token mul: '[*/]' make_op ;  
 
    START -> Expr ;  
 
    Expr/t -> Term/t ( add/op Term/f $t=op(t,f)$ )* ;  
 
    Term/f -> Fact/f ( mul Fact/a $f=op(f,a)$ )* ;  
 
    Fact/a -> number/a | '\(' Expr/a '\)' ;  
 
    """



3.4 Embeding the parser in a script

Since TPG 3 embeding parsers in a script is very easy since the grammar is the doc string2 of a class (see figure 3.8).




Figure 3.8: Writting TPG grammars in Python
import tpg  
 
class MyParser(tpg.Parser):  
    r""" # Your grammar here """  
 
# You can instanciate your parser here  
my_parser = MyParser()



To use this parser you now just need to instanciate an object of the class Calc as in figure 3.9.




Figure 3.9: Complete Python script with expression parser
import tpg  
 
def make_op(s):  
    return {  
        '+': lambda x,y: x+y,  
        '-': lambda x,y: x-y,  
        '*': lambda x,y: x*y,  
        '/': lambda x,y: x/y,  
    }[s]  
 
class Calc(tpg.Parser):  
    r"""  
 
    separator spaces: '\s+' ;  
 
    token number: '\d+' int ;  
    token add: '[+-]' make_op ;  
    token mul: '[*/]' make_op ;  
 
    START/e -> Term/e ;  
    Term/t -> Fact/t ( add/op Fact/f $t=op(t,f)$ )* ;  
    Fact/f -> Atom/f ( mul/op Atom/a $f=op(f,a)$ )* ;  
    Atom/a -> number/a | '\(' Term/a '\)' ;  
 
    """  
 
calc = Calc()  
expr = raw_input('Enter an expression: ')  
print expr, '=', calc(expr)



3.5 Conclusion

This tutorial shows some of the possibilities of TPG. If you have read it carefully you may be able to start with TPG. The next chapters present TPG more precisely. They contain more examples to illustrate all the features of TPG.

Happy TPG’ing!