Writing a simple Compiler on my own - Revisit Queue and Assignment Checking (part 3) [C][Flex][Bison]




[Custom Thumbnail]

All the Code of the series can be found at the Github repository:
https://github.com/drifter1/compiler


Introduction

    Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will continue with the implementation of Assignment checking using the Revisit Queue.

The topics that we will cover today are:

  1. Parser integration
  2. Running for examples

Requirements:

    Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:
  • What Compiler Design is (mainly the steps)
  • For which exact Language the Compiler is build for (Tokens and Grammar)
  • How to use Flex and Bison
  • How to implement a lexer and parser for the language using those tools
  • What the Symbol Table is and how we implement it
  • How we combine Flex and Bison together
  • How we can pass information from the Lexer to the Parser
  • How we define operator priorities, precedencies and associativity
  • What Semantic Analysis is (Attributes, SDT etc.)
  • How we do the so called "Scope Resolution"
  • How we declare types and check the type inter-compatibility for different cases, including function parameters
  • How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
  • Intermediate Code Representations, including AST's
  • How we implement an AST (structure and management)
  • Action Rules for AST nodes and more

Difficulty:

Talking about the series in general this series can be rated:
  • Intermediate to Advanced
Today's topic(s) can be rated:
  • Medium
So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Parser integration

    By making all the necessary changes to the Revisit queue structure (and it's functions) and the "expression_data_type" function, we now just have to integrate those changes into the parser's action rules. In the first part we visualized the problem, creating this complete diagram that explains the whole procedure:


    Thinking about the "expression_data_type" function, this function will be called during the action rule of assignments to get the datatype of the right side of an assignment. We are setting the flag variable "cont_revisit" to 1 whenever we find an undefined function return type. This means that we can check the value of this variable and either perform the assignment check directly (cont_revisit == 0 - contains no undeclared function call) or add the new expression to a revisit queue entry (cont_revisit == 1). We might have created such an entry for the current identifier previously and so using the "search_queue" function is wise. We will create a new entry only if no entry exists (search_queue returns NULL). If an entry exists we manage the array, add the new info (expression node), increment the assignment counter and should also not forget to reset the revisit flag to 0. In the end after the whole parsing is done we will perform all those missing assignment checks!

Based on all this, the assignment rule's action code looks as following:
assigment: var_ref ASSIGN expression
{
    AST_Node_Ref *temp = (AST_Node_Ref*) $1;
    $$ = new_ast_assign_node(temp->entry, temp->ref, $3);
/* find datatypes */ int type1 = get_type(temp->entry->st_name); int type2 = expression_data_type($3);
/* the last function will give us information about revisits */
/* contains revisit => add assignment-check to revisit queue */ if(cont_revisit == 1){ /* search if entry exists */ revisit_queue *q = search_queue(temp->entry->st_name); if(q == NULL){ add_to_queue(temp->entry, temp->entry->st_name, ASSIGN_CHECK); q = search_queue(temp->entry->st_name); }
/* setup structures */ if(q->num_of_assigns == 0){ /* first node */ q->nodes = (void**) malloc(sizeof(void*)); } else{ /* general case */ q->nodes = (void**) realloc(q->nodes, (q->num_of_assigns + 1) * sizeof(void*)); }
/* add info of assignment */ q->nodes[q->num_of_assigns] = (void*) $3;
/* increment number of assignments */ q->num_of_assigns++;
/* reset revisit flag */ cont_revisit = 0;
printf("Assignment revisit for %s at line %d\n", temp->entry->st_name, lineno); } else{ /* no revisit */ /* check assignment semantics */ get_result_type( type1, /* variable datatype */ type2, /* expression datatype */ NONE /* checking compatibility only (no operator) */ ); } } ;


    By adding the upper action code we perform the assignment checks that don't contain function calls. Those that can't be performed will be in the revisit queue waiting for us! By not revisiting them after the parsing is done, the revisit queue dump file for the "full_example.c" file, look like this:


We will get more into that later!

    Performing those remaining checks is quite simple! We just have to loop through the revisit queue and perform the revisit for each assignment check, checking if the revisit_type entry has the value ASSIGN_CHECK. Afterwards we will only get a warning for the cases where a functions never being declared! In code all this looks like this:
/* perform the remaining checks (assignments) */
if(queue != NULL){
    revisit_queue *cur;
    cur = queue;
    while(cur != NULL){
        if(cur->revisit_type == ASSIGN_CHECK){
            revisit(cur->st_name);
        }
        cur = cur->next;
    }
}

Running for examples

    Let's now run our code for the "example2" and "full_example" programs, to understand better what happens!

example2

The example code looks like this:
// main function
int i;
double val = 2.5, res[10];
for(i = 0; i < 10; i++){
    res[i] = operation(val, i);
    val = res[i];
    print(res[i]);
    print('\n');
}
return;
// functions
double operation (double value, int i){
    double res;
    res = value*i + i;
    return res;
}
    As you can see this code contains one function declaration, which is being used in one assignment of the double array "res". Including a message in the console, whenever an assignment needs a revisit, we can run the compiler for this example, to see if the assignment check was really delayed in line 7 of the program. Taking a look at the console we see:


    We clearly add the "res[i] = operation(val, i)" assignment to the revisit queue. Taking a look at the end of the console, we can see that no error was printed!


    So, this means that the assignment check that was inside of the revisit queue was performed correctly, as it should! Let's comment out this revisiting to see what would be inside of the "revisit_dump" file.


You can see that we only have to perform one assignment check (the one from line 7) in a revisit !

full_example

    In the same way the "full_example" code contains assignments that contain function calls in 3 places, where two are for the same variable 'p'. That way the console would print out the following:


The revisit_dump (by commenting out the revisit) contains:


Adding the code back in, the console contains no warning anymore and the revisit_dump is empty!

RESOURCES

References:

No references, just using code that I implemented in my previous articles.

Images:

All of the images are custom-made!

Previous parts of the series

General Knowledge and Lexical Analysis

Syntax Analysis

  • Syntax Analysis Theory
  • Bison basics
  • Creating a grammar for our Language
  • Combine Flex and Bison
  • Passing information from Lexer to Parser
  • Finishing Off The Grammar/Parser (part 1)
  • Finishing Off The Grammar/Parser (part 2)
  • Semantic Analysis (1)

  • Semantic Analysis Theory
  • Semantics Examples
  • Scope Resolution using the Symbol Table
  • Type Declaration and Checking
  • Function Semantics (part 1)
  • Function Semantics (part 2)
  • Intermediate Code Generation (AST)

  • Abstract Syntax Tree Principle
  • Abstract Syntax Tree Structure
  • Abstract Syntax Tree Management
  • Action Rules for Declarations and Initializations
  • Action Rules for Expressions
  • Action Rules for Assignments and Simple Statements
  • Action Rules for If-Else Statements
  • Action Rules for Loop Statements and some Fixes
  • Action Rules for Function Declarations (part 1)
  • Action Rules for Function Declarations (part 2)
  • Action Rules for Function Calls
  • Semantic Analysis (2)

  • Datatype attribute for Expressions
  • Type Checking for Assignments
  • Revisit Queue and Parameter Checking (part 1)
  • Revisit Queue and Parameter Checking (part 2)
  • Revisit Queue and Parameter Checking (part 3)
  • Revisit Queue and Parameter Checking (part 4)
  • Revisit Queue and Assignment Checking (part 1)
  • Revisit Queue and Assignment Checking (part 2)

  • Final words | Next up on the project

         And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
    Next up on this series are:
    • Semantic analysis (using even more action rules in Bison)
    • Machine Code generation (MIPS Assembly)
         Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
    So, see ya next time!

    GitHub Account:

    https://github.com/drifter1

    Keep on drifting! ;)

    H2
    H3
    H4
    3 columns
    2 columns
    1 column
    5 Comments