Writing a simple Compiler on my own - Revisit Queue and Assignment Checking (part 2) [C][Flex][Bison]

[Custom Thumbnail]

All the Code of the series can be found at the Github repository:


    Hello it's a me again @drifter1! Today we continue with my Compiler Series, a series where we implement a complete compiler for a simple C-like language by using the C-tools Flex and Bison. In this article we will continue with the implementation of Assignment checking using the Revisit Queue. The actual integration and explanation on examples will be done in the next part!

The topics that we will cover today are:

  1. Revisit Queue changes
  2. Expression datatype function


    Actually you need to read and understand all the topics that I covered in the series as a whole, as these articles will give you access to knowledge about:
  • What Compiler Design is (mainly the steps)
  • For which exact Language the Compiler is build for (Tokens and Grammar)
  • How to use Flex and Bison
  • How to implement a lexer and parser for the language using those tools
  • What the Symbol Table is and how we implement it
  • How we combine Flex and Bison together
  • How we can pass information from the Lexer to the Parser
  • How we define operator priorities, precedencies and associativity
  • What Semantic Analysis is (Attributes, SDT etc.)
  • How we do the so called "Scope Resolution"
  • How we declare types and check the type inter-compatibility for different cases, including function parameters
  • How we check function calls later on using a "Revisit queue", as function declarations mostly happen after functions get used (in our Language)
  • Intermediate Code Representations, including AST's
  • How we implement an AST (structure and management)
  • Action Rules for AST nodes and more


Talking about the series in general this series can be rated:
  • Intermediate to Advanced
Today's topic(s) can be rated:
  • Medium
So, without further ado, let's now finally start with the actual Tutorial...

Actual Tutorial Content

Revisit Queue changes

    Based on the visualization of the previous part, the Revisit Queue structure code has to change into:
typedef struct revisit_queue{
    // symbol table entry
    list_t *entry;
// name of identifier char *st_name;
// type of revisit int revisit_type;
// parameters of function calls int **par_types; int *num_of_pars; int num_of_calls;
// assignment expression nodes void **nodes; int num_of_assigns;
// maybe additional information to simplify the process ...
struct revisit_queue *next; }revisit_queue;

    You can see that I included some new entries that will be used for the assignment checking. The AST structure can't be used in the Symbol Table and I can't just include the header file, as this causes conflicts in my weird coding structuring! That's why I put the type "void", cause this type allows us to type-cast anything to it..

Let's now get into the function changes!

add_to_queue function

    The array of nodes will be managed using a similar principle to the parameter checking (search queue entry and tweak this entry), but this doesn't mean that we can't initialize anything! Let's initialize the counter "num_of_assigns" to 0, so that we can check this value during the parser integration.
The new code of the function looks as following:
void add_to_queue(list_t *entry, char *name, int type){
    revisit_queue *q;
/* queue is empty */ if(queue == NULL){ /* set up entry */ q = (revisit_queue*) malloc(sizeof(revisit_queue)); q->entry = entry; q->st_name = name; q->revisit_type = type; q->next = NULL;
/* additional info */ if(type == PARAM_CHECK){ q->num_of_calls = 0; } else if(type == ASSIGN_CHECK){ q->num_of_assigns = 0; }
/* q "becomes" the queue */ queue = q; } /* queue not empty */ else{ /* find last element */ q = queue; while(q->next != NULL) q = q->next;
/* add element to the end */ q->next = (revisit_queue*) malloc(sizeof(revisit_queue)); q->next->entry = entry; q->next->st_name = name; q->next->revisit_type = type; q->next->next = NULL;
/* additional info */ if(type == PARAM_CHECK){ q->next->num_of_calls = 0; } else if(type == ASSIGN_CHECK){ q->next->num_of_assigns = 0; } } }

revisit function

    During the revisit of the assignment check, we will have to perform the type inter-compatibility check for each assignment that the identifier is used in (looping through the nodes array). After performing the check we will have to remove the entry from the revisit queue, like always...
The new code looks like this:
int revisit(char *name){
    int i, type1, type2;
revisit_queue *q = search_queue(name); revisit_queue *q2;
if(q == NULL){ return -1; // no entry }
/* revisit entry depending on the type */ switch(q->revisit_type){ case PARAM_CHECK: /* run parameter check */ if(!func_param_check(name, q->num_of_calls, q->par_types, q->num_of_pars)){ printf("Successful Parameter Check of %s\n", name); }
/* remove entry by making it point to it's next */ q2 = search_prev_queue(name); if(q2 == NULL){ /* special case: first entry */ queue = queue->next; } else{ q2->next = q2->next->next; }
break; case ASSIGN_CHECK:
/* run assignment check for each assignment */ type1 = get_type(q->entry->st_name); for(i = 0; i < q->num_of_assigns; i++){ type2 = expression_data_type(q->nodes[i]);
/* perform assignment check */ get_result_type( type1, /* variable datatype */ type2, /* expression datatype */ NONE /* checking compatibility only (no operator) */ ); }
/* remove entry by making it point to it's next */ q2 = search_prev_queue(name); if(q2 == NULL){ /* special case: first entry */ queue = queue->next; } else{ q2->next = q2->next->next; }
break; /* ... */ }
return 0; // success }

revisit_dump function

    Now that we have a new type of revisit we also have to tweak the dump function, so that it prints out that we have to do an assignment check of a specific number of assignments for a specific identifier!
In code this looks like this:
void revisit_dump(FILE *of){
    int i;
    revisit_queue *q;
    q = queue;
fprintf(of,"------------ -------------\n"); fprintf(of,"Identifier Revisit Type\n"); fprintf(of,"------------ -------------\n"); while(q != NULL){ fprintf(of, "%-13s", q->st_name); if(q->revisit_type == PARAM_CHECK){ fprintf(of,"%s","Parameter Check "); fprintf(of,"for %d function calls",q->num_of_calls); } else if(q->revisit_type == ASSIGN_CHECK){ fprintf(of,"%s","Assignment Check "); fprintf(of,"for %d assignments",q->num_of_assigns); } // more later on fprintf(of, "\n"); q = q->next; } }

Expression datatype function

    To be able to know if a variable needs to be revisited, talking about assignments only, we can use the already developed function "expression_data_type". Let's first define a new flag variable "cont_revisit" which will get a value of '1', whenever an identifier of a function call has no declared type. This will help us later on in the parser to understand if the assignment check can be performed or if we have to add the assignment into the revisit queue!

The flag is defined like that:
int cont_revisit = 0; // 1: contains revisit, 0: not

    Talking about the function, we will first have to make sure that we set the datatype of arithmetic, boolean, relational and equality expressions again, during that function! That way the type stored in the "data_type" entry is updated to the correct value. This can be done very easily using the same function that we use for assignment checking, which is "get_result_type", with the correct parameters. For example:
case ARITHM_NODE: /* arithmetic expression */
    temp_arithm = (AST_Node_Arithm *) node;
/* set datatype again */ temp_arithm->data_type = get_result_type( expression_data_type(temp_arithm->left), /* data type of left expression */ expression_data_type(temp_arithm->right), /* data type of right expression */ ARITHM_OP /* operation type */ );
return temp_arithm->data_type;

    Talking about function calls, we will do a "dummy return" of the "INT_TYPE", by also setting the flag to '1', whenever the function identifier has no declared type! In code this looks like this:
case FUNC_CALL:   /* function call */
    temp_func_call = (AST_Node_Func_Call *) node;
/* check if it needs revisit */ if(temp_func_call->entry->st_type == UNDEF){ if(temp_func_call->entry->inf_type == UNDEF){ cont_revisit = 1; /* contains revisit */ return INT_TYPE; /* dummy return */ } }
return temp_func_call->entry->inf_type; /* return type */

The final part will be about how we use today's code to solve the problem! :)



No references, just using code that I implemented in my previous articles.


All of the images are custom-made!

Previous parts of the series

General Knowledge and Lexical Analysis

Syntax Analysis

  • Syntax Analysis Theory
  • Bison basics
  • Creating a grammar for our Language
  • Combine Flex and Bison
  • Passing information from Lexer to Parser
  • Finishing Off The Grammar/Parser (part 1)
  • Finishing Off The Grammar/Parser (part 2)
  • Semantic Analysis (1)

  • Semantic Analysis Theory
  • Semantics Examples
  • Scope Resolution using the Symbol Table
  • Type Declaration and Checking
  • Function Semantics (part 1)
  • Function Semantics (part 2)
  • Intermediate Code Generation (AST)

  • Abstract Syntax Tree Principle
  • Abstract Syntax Tree Structure
  • Abstract Syntax Tree Management
  • Action Rules for Declarations and Initializations
  • Action Rules for Expressions
  • Action Rules for Assignments and Simple Statements
  • Action Rules for If-Else Statements
  • Action Rules for Loop Statements and some Fixes
  • Action Rules for Function Declarations (part 1)
  • Action Rules for Function Declarations (part 2)
  • Action Rules for Function Calls
  • Semantic Analysis (2)

  • Datatype attribute for Expressions
  • Type Checking for Assignments
  • Revisit Queue and Parameter Checking (part 1)
  • Revisit Queue and Parameter Checking (part 2)
  • Revisit Queue and Parameter Checking (part 3)
  • Revisit Queue and Parameter Checking (part 4)
  • Revisit Queue and Assignment Checking (part 1)

  • Final words | Next up on the project

         And this is actually it for today's post! I hope that I explained everything as much as I needed to, meaning that you learned something out of it.
    Next up on this series are:
    • Semantic analysis (using even more action rules in Bison)
    • Machine Code generation (MIPS Assembly)
         Which are all topics that will need more than one article to complete. Also, note that we might also get into Optimizations later on, or could even extend the Language by adding complex datatypes (structs and unions), more rules etc.
    So, see ya next time!

    GitHub Account:


    Keep on drifting! ;)

    3 columns
    2 columns
    1 column