Writing a simple Compiler on my own - Action Rules for If-Else Statements [C][Flex][Bison]

[Custom Thumbnail]

All the Code of the series can be found at the Github repository:


    Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will write the needed Action Rules for If-Else Statements.

More specifically, the topics that we will cover today are:

  1. The need of a Statements-node
  2. Action rules for "simple" if-else statements
  3. Action rules for if-else statements that contains else if's
  4. Running the compiler for "full_example.c"


    Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:
  • What Compiler Design is (mainly the steps)
  • For which exact Language the Compiler is build for (Tokens and Grammar)
  • How to use Flex and Bison
  • How to implement a lexer and parser for the language using those tools
  • What the Symbol Table is and how we implement it
  • How we combine Flex and Bison together
  • How we can pass information from the Lexer to the Parser
  • How we define operator priorities, precedencies and associativity
  • What Semantic Analysis is (Attributes, SDT etc.)
  • How we do the so called "Scope Resolution"
  • How we declare types and check the type inter-compatibility for different cases, including function parameters
  • How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
  • Intermediate Code Representations, including AST's
  • How we implement an AST (structure and management)
  • Action Rules for other cases


Talking about the series in general this series can be rated:
  • Intermediate to Advanced
Today's topic(s) can be rated:
  • Medium
So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

The need of a Statements-Node

    If-else statements, for statements, while statements, optional functions and even the "main" function, all mostly contain more than one statements in their "body" or "tail" as I call it sometimes in the grammar. So, there is the need of creating a new node that will store many "statement" nodes, to be able to access them later on. Knowing that statements are described recursively in our grammar, we can visualize it as follows:

-||- means that the second "Statements" does the same as the first one.

So, let's create such a structure!

The Statements AST Node

    This node will store an dynamic pointer array to AST_Node structures, where each of them will of course be a "statement". To know the amount of statements that are stored in this structure, we have to also define a variable to store the statement count. The whole structure will be of a new Node-type "STATEMENTS" and will have the name "AST_Node_Statements". So, let's define an array "statements" and the integer variable "statement_count" as entries of this structure.

In code this looks as following:
typedef enum Node_Type {
STATEMENTS, // statements
typedef struct AST_Node_Statements{ enum Node_Type type; // node type
// statements struct AST_Node **statements; int statement_count;

Of course this code is inside of "ast.h".

Creation Function

    The way with which the rule "statements" is build up makes it easy for us to use the creation function to add new statements to the "statements"-parent rule. So, the creation function, let's say "new_statements_node", will have 3 parameters which are:
  • The statements array of the previous "statements"-rule
  • The statement_count value of the previous "statements"-rule
  • The "statement" to be added
Knowing that the "statements"-rule looks as following:
statements -> statements statement | statement ;
we understand that the first two parameters are useless for the second sub-rule of "statements". For this rule we will just pass NULL and 0 as parameters!

The function declaration in "ast.h" and actual code of "ast.c" are:
AST_Node *new_statements_node(AST_Node **statements,
int statement_count, AST_Node *statement);
AST_Node *new_statements_node(AST_Node **statements,
int statement_count, AST_Node *statement){
// allocate memory AST_Node_Statements *v = malloc (sizeof (AST_Node_Statements));
// set node type v->type = STATEMENTS;
// first statement if(statements == NULL){ statements = (AST_Node**) malloc (sizeof (AST_Node*)); statements[0] = statement; statement_count = 1; } // add new statement else{ statements = (AST_Node**) realloc (statements, (statement_count + 1) * sizeof (AST_Node*)); statements[statement_count] = statement; statement_count++; }
// set entries v->statements = statements; v->statement_count = statement_count;
// return type-casted result return (struct AST_Node *) v; }

    You can see that there are two cases, which correspond to the second and first sub-rule of "statements". Of course I also did some changes in the traversal and print functions of the nodes. To see these changes go to GitHub, where I always upload the code of the series.

Integrate with the "statements"-rule

    In the second sub-rule we just create a new node with parameter NULL, 0 and the already created node stored in "statement". Knowing that all nodes are type-casted to "AST_Node", the second sub-rule will need to type-cast the already created "statements" node of the "statements"-part and give the statements-array and statement_count of this node as a parameter of the creation function, together with the "statement" node.

So, the new rule now becomes:
  statements statement
    AST_Node_Statements *temp = (AST_Node_Statements*) $1;
    $$ = new_statements_node(temp->statements, temp->statement_count, $2);
  | statement
    $$ = new_statements_node(NULL, 0, $1);

Also don't forget to include the token definition of "statements":
%type <node> statements

    The same stuff will have to be done for "declarations" as well, but I will leave it for when we get to function declarations!

Action Rules for "Simple" If-Else Statements

    The if statement is a somewhat complicated statement as it can include a dynamic amount of else if's and an optional else part. So, we will have to use concepts we already used in previous topics like "declarations", to take all the needed information/attributes and pass them over to the actual If-node! To solve a conflict with the else ifs and else, I had to split the "if_statement" rule into two parts: one with else if's and one without. So, the whole problem can be visualized as following:

    Let's first get into the "simple" case that does not require the complicated passing of Else-If nodes. Visualizing this simple case we see that:

(as you can see I put lots of visualizations today!)

    So, what do we understand? Well, the condition is pretty simple as we already have the information stored in the expression. The actual if_branch is represented by the "tail"-rule, which will have to store the "statements"-node that it get's from the "statements"-rule inside of the bracelets. This tells us that the "tail" also has to be of type "node" and so we add it alongside "statements". In the same way the if_statement, optional_else and else_if rule are also of type node. And so we can already write the following non-terminal type-definitions:
%type <node> statements tail
%type <node> if_statement else_if optional_else

The "tail"-rule becomes:
tail: LBRACE statements RBRACE
    $$ = $2; /* just pass information */

    So, what remains now is the optional else. For the optional else we will just return the "statements"-node stored in the "tail"-part (else exists) or NULL (no else). The rule becomes:
  ELSE tail
    /* else exists */
    $$ = $2;
  | /* empty */
    /* no else */
    $$ = NULL;

    Knowing what we store in each of these non-terminals, the action rule for this simple if statement looks as following:
  IF LPAREN expression RPAREN tail else_if optional_else
    /* later */
  | IF LPAREN expression RPAREN tail optional_else
    $$ = new_ast_if_node($3, $5, NULL, 0, $6);

    You can see that we pass NULL and 0 for the else ifs. For the rest we just find the correct number to put after '$'. All the parameters have the correct type already and so there is no need of type-casting or creating temporary nodes..

Action Rules for If-Else Statements that contains Else If's

    So, now for the more complicated case! We will be using the same idea that we used for declarations and initialization, where we had global variables to store the newly found names and values for the array initialization. In this case we will have a AST_Node pointer array to store the else-if branches (elsifs) and a counter variable for the number of else ifs (elseif_count). To add a new entry we will define a new function "add_elseif" that takes the newly found Else-if node as a parameter and adds it to the global array. We of course have two cases for that:
  1. First entry of the array, where we will have to allocate the array for the first time
  2. General case, where we allocate memory for the new entry and add the entry to the end of the array
So, with all that in mind the parser now contains:
// for else ifs void add_elseif(AST_Node *elsif); AST_Node **elsifs; int elseif_count = 0; %}
void add_elseif(AST_Node *elsif){ // first entry if(elseif_count == 0){ elseif_count = 1; elsifs = (AST_Node **) malloc(1 * sizeof(AST_Node)); elsifs[0] = elsif; } // general case else{ elseif_count++; elsifs = (AST_Node **) realloc(elsifs, elseif_count * sizeof(AST_Node)); elsifs[elseif_count - 1] = elsif; } }

All that we said until now, can be visualized as following

    Using the function "add_elseif", the "else_if"-rule is pretty easy to write now! Both sub-rules do the same thing, but will have a different numeration for the non-terminals that are passed as parameters. The "else_if"-rule looks as following:
  else_if ELSE IF LPAREN expression RPAREN tail
    AST_Node *temp = new_ast_elsif_node($5, $7);
  | ELSE IF LPAREN expression RPAREN tail
    AST_Node *temp = new_ast_elsif_node($4, $6);

    Thinking about the "if_statement"-rule now, the only thing that changes is that we will use these global variables as parameters, by also re-initializing them back to 0 and NULL. So, with all that in mind the final "if_statement" rule becomes:
  IF LPAREN expression RPAREN tail else_if optional_else
    $$ = new_ast_if_node($3, $5, elsifs, elseif_count, $7);
    elseif_count = 0;
    elsifs = NULL;
  | IF LPAREN expression RPAREN tail optional_else
    $$ = new_ast_if_node($3, $5, NULL, 0, $6);

And this is it actually! Wasn't that bad right?

Running the Compiler for "full_example.c"

    Before running the compiler I would like to point out that I removed many "ast_traversal" functions from the action rules, as they would just produce duplicate messages. We will only put "ast_traversal" functions for the declarations and statements rules now. Later on there will be only one traversal of the whole AST, that will be found in the "program"-rule. Of course this will be used for the machine code generation :)

So, let's run the compiler:

    You can see that we get new messages in the console! I also included some explanations for what got printed out. The main outer statements of the "full_example.c" file are of course only 4! The other statement node messages correspond to the for and while statements and more specifically to the "tail" of them. In the beginning we can also see the statements node of the last if-node that appears in the for statement. Nice huh? Next time we will get into Loops ;)



No references, just using code that I implemented in my previous articles.


All of the images are custom-made!

Previous parts of the series

Final words | Next up on the project

     And this is actually it for today's post! I hope that I explained everything well.
Next up on this series are:
  • Semantic analysis (using more action rules in Bison)
  • Intermediate Code generation (using the AST structure for more cases)
  • Machine Code generation (MIPS Assembly)
     Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
So, see ya next time!

GitHub Account:


Keep on drifting! ;)

3 columns
2 columns
1 column